Searched hist:0 (Results 201 - 225 of 40935) sorted by relevance

1234567891011>>

/linux-master/drivers/hwtracing/coresight/
H A Dcoresight-cti-core.cdiff 3244fb6d Tue Jan 10 04:07:34 MST 2023 James Clark <james.clark@arm.com> coresight: cti: Prevent negative values of enable count

Writing 0 to the enable control repeatedly results in a negative value
for enable_req_count. After this, writing 1 to the enable control
appears to not work until the count returns to positive.

Change it so that it's impossible for enable_req_count to be < 0.
Return an error to indicate that the disable request was invalid.

Fixes: 835d722ba10a ("coresight: cti: Initial CoreSight CTI Driver")
Tested-by: Jinlong Mao <quic_jinlmao@quicinc.com>
Signed-off-by: James Clark <james.clark@arm.com>
Reviewed-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20230110110736.2709917-2-james.clark@arm.com
diff 3244fb6d Tue Jan 10 04:07:34 MST 2023 James Clark <james.clark@arm.com> coresight: cti: Prevent negative values of enable count

Writing 0 to the enable control repeatedly results in a negative value
for enable_req_count. After this, writing 1 to the enable control
appears to not work until the count returns to positive.

Change it so that it's impossible for enable_req_count to be < 0.
Return an error to indicate that the disable request was invalid.

Fixes: 835d722ba10a ("coresight: cti: Initial CoreSight CTI Driver")
Tested-by: Jinlong Mao <quic_jinlmao@quicinc.com>
Signed-off-by: James Clark <james.clark@arm.com>
Reviewed-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20230110110736.2709917-2-james.clark@arm.com
diff 3dc228b3 Wed Nov 23 12:38:18 MST 2022 Mike Leach <mike.leach@linaro.org> coresight: cti: Fix null pointer error on CTI init before ETM

When CTI is discovered first then the function
coresight_set_assoc_ectdev_mutex() is called to set the association
between CTI and ETM device. Recent lockdep fix passes a null pointer.

This patch passes the correct pointer.

Before patch: log of boot oops sequence with CTI discovered first:

[ 12.424091] cs_system_cfg: CoreSight Configuration manager initialised
[ 12.483474] coresight cti_sys0: CTI initialized
[ 12.488109] coresight cti_sys1: CTI initialized
[ 12.503594] coresight cti_cpu0: CTI initialized
[ 12.517877] coresight-cpu-debug 850000.debug: Coresight debug-CPU0 initialized
[ 12.523479] coresight-cpu-debug 852000.debug: Coresight debug-CPU1 initialized
[ 12.529926] coresight-cpu-debug 854000.debug: Coresight debug-CPU2 initialized
[ 12.541808] coresight stm0: STM32 initialized
[ 12.544421] coresight-cpu-debug 856000.debug: Coresight debug-CPU3 initialized
[ 12.585639] coresight cti_cpu1: CTI initialized
[ 12.614028] coresight cti_cpu2: CTI initialized
[ 12.631679] CSCFG registered etm0
[ 12.633920] coresight etm0: CPU0: etm v4.0 initialized
[ 12.656392] coresight cti_cpu3: CTI initialized

...

[ 12.708383] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000348

...

[ 12.755094] Internal error: Oops: 0000000096000044 [#1] SMP
[ 12.761817] Modules linked in: coresight_etm4x(+) coresight_tmc coresight_cpu_debug coresight_replicator coresight_funnel coresight_cti coresight_tpiu coresight_stm coresight
[ 12.767210] CPU: 3 PID: 1346 Comm: systemd-udevd Not tainted 6.1.0-rc3tid-v6tid-v6-235166-gf7f7d7a2204a-dirty #498
[ 12.782827] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
[ 12.793154] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 12.800010] pc : coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]
[ 12.806694] lr : coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]

...

[ 12.885064] Call trace:
[ 12.892352] coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]
[ 12.894693] cti_add_assoc_to_csdev+0x144/0x1b0 [coresight_cti]
[ 12.900943] coresight_register+0x2c8/0x320 [coresight]
[ 12.906844] etm4_add_coresight_dev.isra.27+0x148/0x280 [coresight_etm4x]
[ 12.912056] etm4_probe+0x144/0x1c0 [coresight_etm4x]
[ 12.918998] etm4_probe_amba+0x40/0x78 [coresight_etm4x]
[ 12.924032] amba_probe+0x11c/0x1f0

After patch: similar log

[ 12.444467] cs_system_cfg: CoreSight Configuration manager initialised
[ 12.456329] coresight-cpu-debug 850000.debug: Coresight debug-CPU0 initialized
[ 12.456754] coresight-cpu-debug 852000.debug: Coresight debug-CPU1 initialized
[ 12.469672] coresight-cpu-debug 854000.debug: Coresight debug-CPU2 initialized
[ 12.476098] coresight-cpu-debug 856000.debug: Coresight debug-CPU3 initialized
[ 12.532409] coresight stm0: STM32 initialized
[ 12.533708] coresight cti_sys0: CTI initialized
[ 12.539478] coresight cti_sys1: CTI initialized
[ 12.550106] coresight cti_cpu0: CTI initialized
[ 12.633931] coresight cti_cpu1: CTI initialized
[ 12.634664] coresight cti_cpu2: CTI initialized
[ 12.638090] coresight cti_cpu3: CTI initialized
[ 12.721136] CSCFG registered etm0

...

[ 12.762643] CSCFG registered etm1
[ 12.762666] coresight etm1: CPU1: etm v4.0 initialized
[ 12.776258] CSCFG registered etm2
[ 12.776282] coresight etm2: CPU2: etm v4.0 initialized
[ 12.784357] CSCFG registered etm3
[ 12.785455] coresight etm3: CPU3: etm v4.0 initialized

Error can also be triggered by manually starting the modules using modprobe
in the following order:

root@linaro-developer:/home/linaro/cs-mods# modprobe coresight
root@linaro-developer:/home/linaro/cs-mods# modprobe coresight-cti
root@linaro-developer:/home/linaro/cs-mods# modprobe coresight-etm4x

Tested on Dragonboard DB410c
Applies to coresight/next

Fixes: 23722fb46725 ("coresight: Fix possible deadlock with lock dependency")
Signed-off-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20221123193818.6253-1-mike.leach@linaro.org
diff 3dc228b3 Wed Nov 23 12:38:18 MST 2022 Mike Leach <mike.leach@linaro.org> coresight: cti: Fix null pointer error on CTI init before ETM

When CTI is discovered first then the function
coresight_set_assoc_ectdev_mutex() is called to set the association
between CTI and ETM device. Recent lockdep fix passes a null pointer.

This patch passes the correct pointer.

Before patch: log of boot oops sequence with CTI discovered first:

[ 12.424091] cs_system_cfg: CoreSight Configuration manager initialised
[ 12.483474] coresight cti_sys0: CTI initialized
[ 12.488109] coresight cti_sys1: CTI initialized
[ 12.503594] coresight cti_cpu0: CTI initialized
[ 12.517877] coresight-cpu-debug 850000.debug: Coresight debug-CPU0 initialized
[ 12.523479] coresight-cpu-debug 852000.debug: Coresight debug-CPU1 initialized
[ 12.529926] coresight-cpu-debug 854000.debug: Coresight debug-CPU2 initialized
[ 12.541808] coresight stm0: STM32 initialized
[ 12.544421] coresight-cpu-debug 856000.debug: Coresight debug-CPU3 initialized
[ 12.585639] coresight cti_cpu1: CTI initialized
[ 12.614028] coresight cti_cpu2: CTI initialized
[ 12.631679] CSCFG registered etm0
[ 12.633920] coresight etm0: CPU0: etm v4.0 initialized
[ 12.656392] coresight cti_cpu3: CTI initialized

...

[ 12.708383] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000348

...

[ 12.755094] Internal error: Oops: 0000000096000044 [#1] SMP
[ 12.761817] Modules linked in: coresight_etm4x(+) coresight_tmc coresight_cpu_debug coresight_replicator coresight_funnel coresight_cti coresight_tpiu coresight_stm coresight
[ 12.767210] CPU: 3 PID: 1346 Comm: systemd-udevd Not tainted 6.1.0-rc3tid-v6tid-v6-235166-gf7f7d7a2204a-dirty #498
[ 12.782827] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
[ 12.793154] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 12.800010] pc : coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]
[ 12.806694] lr : coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]

...

[ 12.885064] Call trace:
[ 12.892352] coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]
[ 12.894693] cti_add_assoc_to_csdev+0x144/0x1b0 [coresight_cti]
[ 12.900943] coresight_register+0x2c8/0x320 [coresight]
[ 12.906844] etm4_add_coresight_dev.isra.27+0x148/0x280 [coresight_etm4x]
[ 12.912056] etm4_probe+0x144/0x1c0 [coresight_etm4x]
[ 12.918998] etm4_probe_amba+0x40/0x78 [coresight_etm4x]
[ 12.924032] amba_probe+0x11c/0x1f0

After patch: similar log

[ 12.444467] cs_system_cfg: CoreSight Configuration manager initialised
[ 12.456329] coresight-cpu-debug 850000.debug: Coresight debug-CPU0 initialized
[ 12.456754] coresight-cpu-debug 852000.debug: Coresight debug-CPU1 initialized
[ 12.469672] coresight-cpu-debug 854000.debug: Coresight debug-CPU2 initialized
[ 12.476098] coresight-cpu-debug 856000.debug: Coresight debug-CPU3 initialized
[ 12.532409] coresight stm0: STM32 initialized
[ 12.533708] coresight cti_sys0: CTI initialized
[ 12.539478] coresight cti_sys1: CTI initialized
[ 12.550106] coresight cti_cpu0: CTI initialized
[ 12.633931] coresight cti_cpu1: CTI initialized
[ 12.634664] coresight cti_cpu2: CTI initialized
[ 12.638090] coresight cti_cpu3: CTI initialized
[ 12.721136] CSCFG registered etm0

...

[ 12.762643] CSCFG registered etm1
[ 12.762666] coresight etm1: CPU1: etm v4.0 initialized
[ 12.776258] CSCFG registered etm2
[ 12.776282] coresight etm2: CPU2: etm v4.0 initialized
[ 12.784357] CSCFG registered etm3
[ 12.785455] coresight etm3: CPU3: etm v4.0 initialized

Error can also be triggered by manually starting the modules using modprobe
in the following order:

root@linaro-developer:/home/linaro/cs-mods# modprobe coresight
root@linaro-developer:/home/linaro/cs-mods# modprobe coresight-cti
root@linaro-developer:/home/linaro/cs-mods# modprobe coresight-etm4x

Tested on Dragonboard DB410c
Applies to coresight/next

Fixes: 23722fb46725 ("coresight: Fix possible deadlock with lock dependency")
Signed-off-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20221123193818.6253-1-mike.leach@linaro.org
diff 3dc228b3 Wed Nov 23 12:38:18 MST 2022 Mike Leach <mike.leach@linaro.org> coresight: cti: Fix null pointer error on CTI init before ETM

When CTI is discovered first then the function
coresight_set_assoc_ectdev_mutex() is called to set the association
between CTI and ETM device. Recent lockdep fix passes a null pointer.

This patch passes the correct pointer.

Before patch: log of boot oops sequence with CTI discovered first:

[ 12.424091] cs_system_cfg: CoreSight Configuration manager initialised
[ 12.483474] coresight cti_sys0: CTI initialized
[ 12.488109] coresight cti_sys1: CTI initialized
[ 12.503594] coresight cti_cpu0: CTI initialized
[ 12.517877] coresight-cpu-debug 850000.debug: Coresight debug-CPU0 initialized
[ 12.523479] coresight-cpu-debug 852000.debug: Coresight debug-CPU1 initialized
[ 12.529926] coresight-cpu-debug 854000.debug: Coresight debug-CPU2 initialized
[ 12.541808] coresight stm0: STM32 initialized
[ 12.544421] coresight-cpu-debug 856000.debug: Coresight debug-CPU3 initialized
[ 12.585639] coresight cti_cpu1: CTI initialized
[ 12.614028] coresight cti_cpu2: CTI initialized
[ 12.631679] CSCFG registered etm0
[ 12.633920] coresight etm0: CPU0: etm v4.0 initialized
[ 12.656392] coresight cti_cpu3: CTI initialized

...

[ 12.708383] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000348

...

[ 12.755094] Internal error: Oops: 0000000096000044 [#1] SMP
[ 12.761817] Modules linked in: coresight_etm4x(+) coresight_tmc coresight_cpu_debug coresight_replicator coresight_funnel coresight_cti coresight_tpiu coresight_stm coresight
[ 12.767210] CPU: 3 PID: 1346 Comm: systemd-udevd Not tainted 6.1.0-rc3tid-v6tid-v6-235166-gf7f7d7a2204a-dirty #498
[ 12.782827] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
[ 12.793154] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 12.800010] pc : coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]
[ 12.806694] lr : coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]

...

[ 12.885064] Call trace:
[ 12.892352] coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]
[ 12.894693] cti_add_assoc_to_csdev+0x144/0x1b0 [coresight_cti]
[ 12.900943] coresight_register+0x2c8/0x320 [coresight]
[ 12.906844] etm4_add_coresight_dev.isra.27+0x148/0x280 [coresight_etm4x]
[ 12.912056] etm4_probe+0x144/0x1c0 [coresight_etm4x]
[ 12.918998] etm4_probe_amba+0x40/0x78 [coresight_etm4x]
[ 12.924032] amba_probe+0x11c/0x1f0

After patch: similar log

[ 12.444467] cs_system_cfg: CoreSight Configuration manager initialised
[ 12.456329] coresight-cpu-debug 850000.debug: Coresight debug-CPU0 initialized
[ 12.456754] coresight-cpu-debug 852000.debug: Coresight debug-CPU1 initialized
[ 12.469672] coresight-cpu-debug 854000.debug: Coresight debug-CPU2 initialized
[ 12.476098] coresight-cpu-debug 856000.debug: Coresight debug-CPU3 initialized
[ 12.532409] coresight stm0: STM32 initialized
[ 12.533708] coresight cti_sys0: CTI initialized
[ 12.539478] coresight cti_sys1: CTI initialized
[ 12.550106] coresight cti_cpu0: CTI initialized
[ 12.633931] coresight cti_cpu1: CTI initialized
[ 12.634664] coresight cti_cpu2: CTI initialized
[ 12.638090] coresight cti_cpu3: CTI initialized
[ 12.721136] CSCFG registered etm0

...

[ 12.762643] CSCFG registered etm1
[ 12.762666] coresight etm1: CPU1: etm v4.0 initialized
[ 12.776258] CSCFG registered etm2
[ 12.776282] coresight etm2: CPU2: etm v4.0 initialized
[ 12.784357] CSCFG registered etm3
[ 12.785455] coresight etm3: CPU3: etm v4.0 initialized

Error can also be triggered by manually starting the modules using modprobe
in the following order:

root@linaro-developer:/home/linaro/cs-mods# modprobe coresight
root@linaro-developer:/home/linaro/cs-mods# modprobe coresight-cti
root@linaro-developer:/home/linaro/cs-mods# modprobe coresight-etm4x

Tested on Dragonboard DB410c
Applies to coresight/next

Fixes: 23722fb46725 ("coresight: Fix possible deadlock with lock dependency")
Signed-off-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20221123193818.6253-1-mike.leach@linaro.org
diff 3dc228b3 Wed Nov 23 12:38:18 MST 2022 Mike Leach <mike.leach@linaro.org> coresight: cti: Fix null pointer error on CTI init before ETM

When CTI is discovered first then the function
coresight_set_assoc_ectdev_mutex() is called to set the association
between CTI and ETM device. Recent lockdep fix passes a null pointer.

This patch passes the correct pointer.

Before patch: log of boot oops sequence with CTI discovered first:

[ 12.424091] cs_system_cfg: CoreSight Configuration manager initialised
[ 12.483474] coresight cti_sys0: CTI initialized
[ 12.488109] coresight cti_sys1: CTI initialized
[ 12.503594] coresight cti_cpu0: CTI initialized
[ 12.517877] coresight-cpu-debug 850000.debug: Coresight debug-CPU0 initialized
[ 12.523479] coresight-cpu-debug 852000.debug: Coresight debug-CPU1 initialized
[ 12.529926] coresight-cpu-debug 854000.debug: Coresight debug-CPU2 initialized
[ 12.541808] coresight stm0: STM32 initialized
[ 12.544421] coresight-cpu-debug 856000.debug: Coresight debug-CPU3 initialized
[ 12.585639] coresight cti_cpu1: CTI initialized
[ 12.614028] coresight cti_cpu2: CTI initialized
[ 12.631679] CSCFG registered etm0
[ 12.633920] coresight etm0: CPU0: etm v4.0 initialized
[ 12.656392] coresight cti_cpu3: CTI initialized

...

[ 12.708383] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000348

...

[ 12.755094] Internal error: Oops: 0000000096000044 [#1] SMP
[ 12.761817] Modules linked in: coresight_etm4x(+) coresight_tmc coresight_cpu_debug coresight_replicator coresight_funnel coresight_cti coresight_tpiu coresight_stm coresight
[ 12.767210] CPU: 3 PID: 1346 Comm: systemd-udevd Not tainted 6.1.0-rc3tid-v6tid-v6-235166-gf7f7d7a2204a-dirty #498
[ 12.782827] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
[ 12.793154] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 12.800010] pc : coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]
[ 12.806694] lr : coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]

...

[ 12.885064] Call trace:
[ 12.892352] coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]
[ 12.894693] cti_add_assoc_to_csdev+0x144/0x1b0 [coresight_cti]
[ 12.900943] coresight_register+0x2c8/0x320 [coresight]
[ 12.906844] etm4_add_coresight_dev.isra.27+0x148/0x280 [coresight_etm4x]
[ 12.912056] etm4_probe+0x144/0x1c0 [coresight_etm4x]
[ 12.918998] etm4_probe_amba+0x40/0x78 [coresight_etm4x]
[ 12.924032] amba_probe+0x11c/0x1f0

After patch: similar log

[ 12.444467] cs_system_cfg: CoreSight Configuration manager initialised
[ 12.456329] coresight-cpu-debug 850000.debug: Coresight debug-CPU0 initialized
[ 12.456754] coresight-cpu-debug 852000.debug: Coresight debug-CPU1 initialized
[ 12.469672] coresight-cpu-debug 854000.debug: Coresight debug-CPU2 initialized
[ 12.476098] coresight-cpu-debug 856000.debug: Coresight debug-CPU3 initialized
[ 12.532409] coresight stm0: STM32 initialized
[ 12.533708] coresight cti_sys0: CTI initialized
[ 12.539478] coresight cti_sys1: CTI initialized
[ 12.550106] coresight cti_cpu0: CTI initialized
[ 12.633931] coresight cti_cpu1: CTI initialized
[ 12.634664] coresight cti_cpu2: CTI initialized
[ 12.638090] coresight cti_cpu3: CTI initialized
[ 12.721136] CSCFG registered etm0

...

[ 12.762643] CSCFG registered etm1
[ 12.762666] coresight etm1: CPU1: etm v4.0 initialized
[ 12.776258] CSCFG registered etm2
[ 12.776282] coresight etm2: CPU2: etm v4.0 initialized
[ 12.784357] CSCFG registered etm3
[ 12.785455] coresight etm3: CPU3: etm v4.0 initialized

Error can also be triggered by manually starting the modules using modprobe
in the following order:

root@linaro-developer:/home/linaro/cs-mods# modprobe coresight
root@linaro-developer:/home/linaro/cs-mods# modprobe coresight-cti
root@linaro-developer:/home/linaro/cs-mods# modprobe coresight-etm4x

Tested on Dragonboard DB410c
Applies to coresight/next

Fixes: 23722fb46725 ("coresight: Fix possible deadlock with lock dependency")
Signed-off-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20221123193818.6253-1-mike.leach@linaro.org
diff 3dc228b3 Wed Nov 23 12:38:18 MST 2022 Mike Leach <mike.leach@linaro.org> coresight: cti: Fix null pointer error on CTI init before ETM

When CTI is discovered first then the function
coresight_set_assoc_ectdev_mutex() is called to set the association
between CTI and ETM device. Recent lockdep fix passes a null pointer.

This patch passes the correct pointer.

Before patch: log of boot oops sequence with CTI discovered first:

[ 12.424091] cs_system_cfg: CoreSight Configuration manager initialised
[ 12.483474] coresight cti_sys0: CTI initialized
[ 12.488109] coresight cti_sys1: CTI initialized
[ 12.503594] coresight cti_cpu0: CTI initialized
[ 12.517877] coresight-cpu-debug 850000.debug: Coresight debug-CPU0 initialized
[ 12.523479] coresight-cpu-debug 852000.debug: Coresight debug-CPU1 initialized
[ 12.529926] coresight-cpu-debug 854000.debug: Coresight debug-CPU2 initialized
[ 12.541808] coresight stm0: STM32 initialized
[ 12.544421] coresight-cpu-debug 856000.debug: Coresight debug-CPU3 initialized
[ 12.585639] coresight cti_cpu1: CTI initialized
[ 12.614028] coresight cti_cpu2: CTI initialized
[ 12.631679] CSCFG registered etm0
[ 12.633920] coresight etm0: CPU0: etm v4.0 initialized
[ 12.656392] coresight cti_cpu3: CTI initialized

...

[ 12.708383] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000348

...

[ 12.755094] Internal error: Oops: 0000000096000044 [#1] SMP
[ 12.761817] Modules linked in: coresight_etm4x(+) coresight_tmc coresight_cpu_debug coresight_replicator coresight_funnel coresight_cti coresight_tpiu coresight_stm coresight
[ 12.767210] CPU: 3 PID: 1346 Comm: systemd-udevd Not tainted 6.1.0-rc3tid-v6tid-v6-235166-gf7f7d7a2204a-dirty #498
[ 12.782827] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
[ 12.793154] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 12.800010] pc : coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]
[ 12.806694] lr : coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]

...

[ 12.885064] Call trace:
[ 12.892352] coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]
[ 12.894693] cti_add_assoc_to_csdev+0x144/0x1b0 [coresight_cti]
[ 12.900943] coresight_register+0x2c8/0x320 [coresight]
[ 12.906844] etm4_add_coresight_dev.isra.27+0x148/0x280 [coresight_etm4x]
[ 12.912056] etm4_probe+0x144/0x1c0 [coresight_etm4x]
[ 12.918998] etm4_probe_amba+0x40/0x78 [coresight_etm4x]
[ 12.924032] amba_probe+0x11c/0x1f0

After patch: similar log

[ 12.444467] cs_system_cfg: CoreSight Configuration manager initialised
[ 12.456329] coresight-cpu-debug 850000.debug: Coresight debug-CPU0 initialized
[ 12.456754] coresight-cpu-debug 852000.debug: Coresight debug-CPU1 initialized
[ 12.469672] coresight-cpu-debug 854000.debug: Coresight debug-CPU2 initialized
[ 12.476098] coresight-cpu-debug 856000.debug: Coresight debug-CPU3 initialized
[ 12.532409] coresight stm0: STM32 initialized
[ 12.533708] coresight cti_sys0: CTI initialized
[ 12.539478] coresight cti_sys1: CTI initialized
[ 12.550106] coresight cti_cpu0: CTI initialized
[ 12.633931] coresight cti_cpu1: CTI initialized
[ 12.634664] coresight cti_cpu2: CTI initialized
[ 12.638090] coresight cti_cpu3: CTI initialized
[ 12.721136] CSCFG registered etm0

...

[ 12.762643] CSCFG registered etm1
[ 12.762666] coresight etm1: CPU1: etm v4.0 initialized
[ 12.776258] CSCFG registered etm2
[ 12.776282] coresight etm2: CPU2: etm v4.0 initialized
[ 12.784357] CSCFG registered etm3
[ 12.785455] coresight etm3: CPU3: etm v4.0 initialized

Error can also be triggered by manually starting the modules using modprobe
in the following order:

root@linaro-developer:/home/linaro/cs-mods# modprobe coresight
root@linaro-developer:/home/linaro/cs-mods# modprobe coresight-cti
root@linaro-developer:/home/linaro/cs-mods# modprobe coresight-etm4x

Tested on Dragonboard DB410c
Applies to coresight/next

Fixes: 23722fb46725 ("coresight: Fix possible deadlock with lock dependency")
Signed-off-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20221123193818.6253-1-mike.leach@linaro.org
diff 3dc228b3 Wed Nov 23 12:38:18 MST 2022 Mike Leach <mike.leach@linaro.org> coresight: cti: Fix null pointer error on CTI init before ETM

When CTI is discovered first then the function
coresight_set_assoc_ectdev_mutex() is called to set the association
between CTI and ETM device. Recent lockdep fix passes a null pointer.

This patch passes the correct pointer.

Before patch: log of boot oops sequence with CTI discovered first:

[ 12.424091] cs_system_cfg: CoreSight Configuration manager initialised
[ 12.483474] coresight cti_sys0: CTI initialized
[ 12.488109] coresight cti_sys1: CTI initialized
[ 12.503594] coresight cti_cpu0: CTI initialized
[ 12.517877] coresight-cpu-debug 850000.debug: Coresight debug-CPU0 initialized
[ 12.523479] coresight-cpu-debug 852000.debug: Coresight debug-CPU1 initialized
[ 12.529926] coresight-cpu-debug 854000.debug: Coresight debug-CPU2 initialized
[ 12.541808] coresight stm0: STM32 initialized
[ 12.544421] coresight-cpu-debug 856000.debug: Coresight debug-CPU3 initialized
[ 12.585639] coresight cti_cpu1: CTI initialized
[ 12.614028] coresight cti_cpu2: CTI initialized
[ 12.631679] CSCFG registered etm0
[ 12.633920] coresight etm0: CPU0: etm v4.0 initialized
[ 12.656392] coresight cti_cpu3: CTI initialized

...

[ 12.708383] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000348

...

[ 12.755094] Internal error: Oops: 0000000096000044 [#1] SMP
[ 12.761817] Modules linked in: coresight_etm4x(+) coresight_tmc coresight_cpu_debug coresight_replicator coresight_funnel coresight_cti coresight_tpiu coresight_stm coresight
[ 12.767210] CPU: 3 PID: 1346 Comm: systemd-udevd Not tainted 6.1.0-rc3tid-v6tid-v6-235166-gf7f7d7a2204a-dirty #498
[ 12.782827] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
[ 12.793154] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 12.800010] pc : coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]
[ 12.806694] lr : coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]

...

[ 12.885064] Call trace:
[ 12.892352] coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]
[ 12.894693] cti_add_assoc_to_csdev+0x144/0x1b0 [coresight_cti]
[ 12.900943] coresight_register+0x2c8/0x320 [coresight]
[ 12.906844] etm4_add_coresight_dev.isra.27+0x148/0x280 [coresight_etm4x]
[ 12.912056] etm4_probe+0x144/0x1c0 [coresight_etm4x]
[ 12.918998] etm4_probe_amba+0x40/0x78 [coresight_etm4x]
[ 12.924032] amba_probe+0x11c/0x1f0

After patch: similar log

[ 12.444467] cs_system_cfg: CoreSight Configuration manager initialised
[ 12.456329] coresight-cpu-debug 850000.debug: Coresight debug-CPU0 initialized
[ 12.456754] coresight-cpu-debug 852000.debug: Coresight debug-CPU1 initialized
[ 12.469672] coresight-cpu-debug 854000.debug: Coresight debug-CPU2 initialized
[ 12.476098] coresight-cpu-debug 856000.debug: Coresight debug-CPU3 initialized
[ 12.532409] coresight stm0: STM32 initialized
[ 12.533708] coresight cti_sys0: CTI initialized
[ 12.539478] coresight cti_sys1: CTI initialized
[ 12.550106] coresight cti_cpu0: CTI initialized
[ 12.633931] coresight cti_cpu1: CTI initialized
[ 12.634664] coresight cti_cpu2: CTI initialized
[ 12.638090] coresight cti_cpu3: CTI initialized
[ 12.721136] CSCFG registered etm0

...

[ 12.762643] CSCFG registered etm1
[ 12.762666] coresight etm1: CPU1: etm v4.0 initialized
[ 12.776258] CSCFG registered etm2
[ 12.776282] coresight etm2: CPU2: etm v4.0 initialized
[ 12.784357] CSCFG registered etm3
[ 12.785455] coresight etm3: CPU3: etm v4.0 initialized

Error can also be triggered by manually starting the modules using modprobe
in the following order:

root@linaro-developer:/home/linaro/cs-mods# modprobe coresight
root@linaro-developer:/home/linaro/cs-mods# modprobe coresight-cti
root@linaro-developer:/home/linaro/cs-mods# modprobe coresight-etm4x

Tested on Dragonboard DB410c
Applies to coresight/next

Fixes: 23722fb46725 ("coresight: Fix possible deadlock with lock dependency")
Signed-off-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20221123193818.6253-1-mike.leach@linaro.org
diff 3dc228b3 Wed Nov 23 12:38:18 MST 2022 Mike Leach <mike.leach@linaro.org> coresight: cti: Fix null pointer error on CTI init before ETM

When CTI is discovered first then the function
coresight_set_assoc_ectdev_mutex() is called to set the association
between CTI and ETM device. Recent lockdep fix passes a null pointer.

This patch passes the correct pointer.

Before patch: log of boot oops sequence with CTI discovered first:

[ 12.424091] cs_system_cfg: CoreSight Configuration manager initialised
[ 12.483474] coresight cti_sys0: CTI initialized
[ 12.488109] coresight cti_sys1: CTI initialized
[ 12.503594] coresight cti_cpu0: CTI initialized
[ 12.517877] coresight-cpu-debug 850000.debug: Coresight debug-CPU0 initialized
[ 12.523479] coresight-cpu-debug 852000.debug: Coresight debug-CPU1 initialized
[ 12.529926] coresight-cpu-debug 854000.debug: Coresight debug-CPU2 initialized
[ 12.541808] coresight stm0: STM32 initialized
[ 12.544421] coresight-cpu-debug 856000.debug: Coresight debug-CPU3 initialized
[ 12.585639] coresight cti_cpu1: CTI initialized
[ 12.614028] coresight cti_cpu2: CTI initialized
[ 12.631679] CSCFG registered etm0
[ 12.633920] coresight etm0: CPU0: etm v4.0 initialized
[ 12.656392] coresight cti_cpu3: CTI initialized

...

[ 12.708383] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000348

...

[ 12.755094] Internal error: Oops: 0000000096000044 [#1] SMP
[ 12.761817] Modules linked in: coresight_etm4x(+) coresight_tmc coresight_cpu_debug coresight_replicator coresight_funnel coresight_cti coresight_tpiu coresight_stm coresight
[ 12.767210] CPU: 3 PID: 1346 Comm: systemd-udevd Not tainted 6.1.0-rc3tid-v6tid-v6-235166-gf7f7d7a2204a-dirty #498
[ 12.782827] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
[ 12.793154] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 12.800010] pc : coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]
[ 12.806694] lr : coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]

...

[ 12.885064] Call trace:
[ 12.892352] coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]
[ 12.894693] cti_add_assoc_to_csdev+0x144/0x1b0 [coresight_cti]
[ 12.900943] coresight_register+0x2c8/0x320 [coresight]
[ 12.906844] etm4_add_coresight_dev.isra.27+0x148/0x280 [coresight_etm4x]
[ 12.912056] etm4_probe+0x144/0x1c0 [coresight_etm4x]
[ 12.918998] etm4_probe_amba+0x40/0x78 [coresight_etm4x]
[ 12.924032] amba_probe+0x11c/0x1f0

After patch: similar log

[ 12.444467] cs_system_cfg: CoreSight Configuration manager initialised
[ 12.456329] coresight-cpu-debug 850000.debug: Coresight debug-CPU0 initialized
[ 12.456754] coresight-cpu-debug 852000.debug: Coresight debug-CPU1 initialized
[ 12.469672] coresight-cpu-debug 854000.debug: Coresight debug-CPU2 initialized
[ 12.476098] coresight-cpu-debug 856000.debug: Coresight debug-CPU3 initialized
[ 12.532409] coresight stm0: STM32 initialized
[ 12.533708] coresight cti_sys0: CTI initialized
[ 12.539478] coresight cti_sys1: CTI initialized
[ 12.550106] coresight cti_cpu0: CTI initialized
[ 12.633931] coresight cti_cpu1: CTI initialized
[ 12.634664] coresight cti_cpu2: CTI initialized
[ 12.638090] coresight cti_cpu3: CTI initialized
[ 12.721136] CSCFG registered etm0

...

[ 12.762643] CSCFG registered etm1
[ 12.762666] coresight etm1: CPU1: etm v4.0 initialized
[ 12.776258] CSCFG registered etm2
[ 12.776282] coresight etm2: CPU2: etm v4.0 initialized
[ 12.784357] CSCFG registered etm3
[ 12.785455] coresight etm3: CPU3: etm v4.0 initialized

Error can also be triggered by manually starting the modules using modprobe
in the following order:

root@linaro-developer:/home/linaro/cs-mods# modprobe coresight
root@linaro-developer:/home/linaro/cs-mods# modprobe coresight-cti
root@linaro-developer:/home/linaro/cs-mods# modprobe coresight-etm4x

Tested on Dragonboard DB410c
Applies to coresight/next

Fixes: 23722fb46725 ("coresight: Fix possible deadlock with lock dependency")
Signed-off-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20221123193818.6253-1-mike.leach@linaro.org
diff 3dc228b3 Wed Nov 23 12:38:18 MST 2022 Mike Leach <mike.leach@linaro.org> coresight: cti: Fix null pointer error on CTI init before ETM

When CTI is discovered first then the function
coresight_set_assoc_ectdev_mutex() is called to set the association
between CTI and ETM device. Recent lockdep fix passes a null pointer.

This patch passes the correct pointer.

Before patch: log of boot oops sequence with CTI discovered first:

[ 12.424091] cs_system_cfg: CoreSight Configuration manager initialised
[ 12.483474] coresight cti_sys0: CTI initialized
[ 12.488109] coresight cti_sys1: CTI initialized
[ 12.503594] coresight cti_cpu0: CTI initialized
[ 12.517877] coresight-cpu-debug 850000.debug: Coresight debug-CPU0 initialized
[ 12.523479] coresight-cpu-debug 852000.debug: Coresight debug-CPU1 initialized
[ 12.529926] coresight-cpu-debug 854000.debug: Coresight debug-CPU2 initialized
[ 12.541808] coresight stm0: STM32 initialized
[ 12.544421] coresight-cpu-debug 856000.debug: Coresight debug-CPU3 initialized
[ 12.585639] coresight cti_cpu1: CTI initialized
[ 12.614028] coresight cti_cpu2: CTI initialized
[ 12.631679] CSCFG registered etm0
[ 12.633920] coresight etm0: CPU0: etm v4.0 initialized
[ 12.656392] coresight cti_cpu3: CTI initialized

...

[ 12.708383] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000348

...

[ 12.755094] Internal error: Oops: 0000000096000044 [#1] SMP
[ 12.761817] Modules linked in: coresight_etm4x(+) coresight_tmc coresight_cpu_debug coresight_replicator coresight_funnel coresight_cti coresight_tpiu coresight_stm coresight
[ 12.767210] CPU: 3 PID: 1346 Comm: systemd-udevd Not tainted 6.1.0-rc3tid-v6tid-v6-235166-gf7f7d7a2204a-dirty #498
[ 12.782827] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
[ 12.793154] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 12.800010] pc : coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]
[ 12.806694] lr : coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]

...

[ 12.885064] Call trace:
[ 12.892352] coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]
[ 12.894693] cti_add_assoc_to_csdev+0x144/0x1b0 [coresight_cti]
[ 12.900943] coresight_register+0x2c8/0x320 [coresight]
[ 12.906844] etm4_add_coresight_dev.isra.27+0x148/0x280 [coresight_etm4x]
[ 12.912056] etm4_probe+0x144/0x1c0 [coresight_etm4x]
[ 12.918998] etm4_probe_amba+0x40/0x78 [coresight_etm4x]
[ 12.924032] amba_probe+0x11c/0x1f0

After patch: similar log

[ 12.444467] cs_system_cfg: CoreSight Configuration manager initialised
[ 12.456329] coresight-cpu-debug 850000.debug: Coresight debug-CPU0 initialized
[ 12.456754] coresight-cpu-debug 852000.debug: Coresight debug-CPU1 initialized
[ 12.469672] coresight-cpu-debug 854000.debug: Coresight debug-CPU2 initialized
[ 12.476098] coresight-cpu-debug 856000.debug: Coresight debug-CPU3 initialized
[ 12.532409] coresight stm0: STM32 initialized
[ 12.533708] coresight cti_sys0: CTI initialized
[ 12.539478] coresight cti_sys1: CTI initialized
[ 12.550106] coresight cti_cpu0: CTI initialized
[ 12.633931] coresight cti_cpu1: CTI initialized
[ 12.634664] coresight cti_cpu2: CTI initialized
[ 12.638090] coresight cti_cpu3: CTI initialized
[ 12.721136] CSCFG registered etm0

...

[ 12.762643] CSCFG registered etm1
[ 12.762666] coresight etm1: CPU1: etm v4.0 initialized
[ 12.776258] CSCFG registered etm2
[ 12.776282] coresight etm2: CPU2: etm v4.0 initialized
[ 12.784357] CSCFG registered etm3
[ 12.785455] coresight etm3: CPU3: etm v4.0 initialized

Error can also be triggered by manually starting the modules using modprobe
in the following order:

root@linaro-developer:/home/linaro/cs-mods# modprobe coresight
root@linaro-developer:/home/linaro/cs-mods# modprobe coresight-cti
root@linaro-developer:/home/linaro/cs-mods# modprobe coresight-etm4x

Tested on Dragonboard DB410c
Applies to coresight/next

Fixes: 23722fb46725 ("coresight: Fix possible deadlock with lock dependency")
Signed-off-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20221123193818.6253-1-mike.leach@linaro.org
diff 3dc228b3 Wed Nov 23 12:38:18 MST 2022 Mike Leach <mike.leach@linaro.org> coresight: cti: Fix null pointer error on CTI init before ETM

When CTI is discovered first then the function
coresight_set_assoc_ectdev_mutex() is called to set the association
between CTI and ETM device. Recent lockdep fix passes a null pointer.

This patch passes the correct pointer.

Before patch: log of boot oops sequence with CTI discovered first:

[ 12.424091] cs_system_cfg: CoreSight Configuration manager initialised
[ 12.483474] coresight cti_sys0: CTI initialized
[ 12.488109] coresight cti_sys1: CTI initialized
[ 12.503594] coresight cti_cpu0: CTI initialized
[ 12.517877] coresight-cpu-debug 850000.debug: Coresight debug-CPU0 initialized
[ 12.523479] coresight-cpu-debug 852000.debug: Coresight debug-CPU1 initialized
[ 12.529926] coresight-cpu-debug 854000.debug: Coresight debug-CPU2 initialized
[ 12.541808] coresight stm0: STM32 initialized
[ 12.544421] coresight-cpu-debug 856000.debug: Coresight debug-CPU3 initialized
[ 12.585639] coresight cti_cpu1: CTI initialized
[ 12.614028] coresight cti_cpu2: CTI initialized
[ 12.631679] CSCFG registered etm0
[ 12.633920] coresight etm0: CPU0: etm v4.0 initialized
[ 12.656392] coresight cti_cpu3: CTI initialized

...

[ 12.708383] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000348

...

[ 12.755094] Internal error: Oops: 0000000096000044 [#1] SMP
[ 12.761817] Modules linked in: coresight_etm4x(+) coresight_tmc coresight_cpu_debug coresight_replicator coresight_funnel coresight_cti coresight_tpiu coresight_stm coresight
[ 12.767210] CPU: 3 PID: 1346 Comm: systemd-udevd Not tainted 6.1.0-rc3tid-v6tid-v6-235166-gf7f7d7a2204a-dirty #498
[ 12.782827] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
[ 12.793154] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 12.800010] pc : coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]
[ 12.806694] lr : coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]

...

[ 12.885064] Call trace:
[ 12.892352] coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]
[ 12.894693] cti_add_assoc_to_csdev+0x144/0x1b0 [coresight_cti]
[ 12.900943] coresight_register+0x2c8/0x320 [coresight]
[ 12.906844] etm4_add_coresight_dev.isra.27+0x148/0x280 [coresight_etm4x]
[ 12.912056] etm4_probe+0x144/0x1c0 [coresight_etm4x]
[ 12.918998] etm4_probe_amba+0x40/0x78 [coresight_etm4x]
[ 12.924032] amba_probe+0x11c/0x1f0

After patch: similar log

[ 12.444467] cs_system_cfg: CoreSight Configuration manager initialised
[ 12.456329] coresight-cpu-debug 850000.debug: Coresight debug-CPU0 initialized
[ 12.456754] coresight-cpu-debug 852000.debug: Coresight debug-CPU1 initialized
[ 12.469672] coresight-cpu-debug 854000.debug: Coresight debug-CPU2 initialized
[ 12.476098] coresight-cpu-debug 856000.debug: Coresight debug-CPU3 initialized
[ 12.532409] coresight stm0: STM32 initialized
[ 12.533708] coresight cti_sys0: CTI initialized
[ 12.539478] coresight cti_sys1: CTI initialized
[ 12.550106] coresight cti_cpu0: CTI initialized
[ 12.633931] coresight cti_cpu1: CTI initialized
[ 12.634664] coresight cti_cpu2: CTI initialized
[ 12.638090] coresight cti_cpu3: CTI initialized
[ 12.721136] CSCFG registered etm0

...

[ 12.762643] CSCFG registered etm1
[ 12.762666] coresight etm1: CPU1: etm v4.0 initialized
[ 12.776258] CSCFG registered etm2
[ 12.776282] coresight etm2: CPU2: etm v4.0 initialized
[ 12.784357] CSCFG registered etm3
[ 12.785455] coresight etm3: CPU3: etm v4.0 initialized

Error can also be triggered by manually starting the modules using modprobe
in the following order:

root@linaro-developer:/home/linaro/cs-mods# modprobe coresight
root@linaro-developer:/home/linaro/cs-mods# modprobe coresight-cti
root@linaro-developer:/home/linaro/cs-mods# modprobe coresight-etm4x

Tested on Dragonboard DB410c
Applies to coresight/next

Fixes: 23722fb46725 ("coresight: Fix possible deadlock with lock dependency")
Signed-off-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20221123193818.6253-1-mike.leach@linaro.org
diff 3dc228b3 Wed Nov 23 12:38:18 MST 2022 Mike Leach <mike.leach@linaro.org> coresight: cti: Fix null pointer error on CTI init before ETM

When CTI is discovered first then the function
coresight_set_assoc_ectdev_mutex() is called to set the association
between CTI and ETM device. Recent lockdep fix passes a null pointer.

This patch passes the correct pointer.

Before patch: log of boot oops sequence with CTI discovered first:

[ 12.424091] cs_system_cfg: CoreSight Configuration manager initialised
[ 12.483474] coresight cti_sys0: CTI initialized
[ 12.488109] coresight cti_sys1: CTI initialized
[ 12.503594] coresight cti_cpu0: CTI initialized
[ 12.517877] coresight-cpu-debug 850000.debug: Coresight debug-CPU0 initialized
[ 12.523479] coresight-cpu-debug 852000.debug: Coresight debug-CPU1 initialized
[ 12.529926] coresight-cpu-debug 854000.debug: Coresight debug-CPU2 initialized
[ 12.541808] coresight stm0: STM32 initialized
[ 12.544421] coresight-cpu-debug 856000.debug: Coresight debug-CPU3 initialized
[ 12.585639] coresight cti_cpu1: CTI initialized
[ 12.614028] coresight cti_cpu2: CTI initialized
[ 12.631679] CSCFG registered etm0
[ 12.633920] coresight etm0: CPU0: etm v4.0 initialized
[ 12.656392] coresight cti_cpu3: CTI initialized

...

[ 12.708383] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000348

...

[ 12.755094] Internal error: Oops: 0000000096000044 [#1] SMP
[ 12.761817] Modules linked in: coresight_etm4x(+) coresight_tmc coresight_cpu_debug coresight_replicator coresight_funnel coresight_cti coresight_tpiu coresight_stm coresight
[ 12.767210] CPU: 3 PID: 1346 Comm: systemd-udevd Not tainted 6.1.0-rc3tid-v6tid-v6-235166-gf7f7d7a2204a-dirty #498
[ 12.782827] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
[ 12.793154] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 12.800010] pc : coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]
[ 12.806694] lr : coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]

...

[ 12.885064] Call trace:
[ 12.892352] coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]
[ 12.894693] cti_add_assoc_to_csdev+0x144/0x1b0 [coresight_cti]
[ 12.900943] coresight_register+0x2c8/0x320 [coresight]
[ 12.906844] etm4_add_coresight_dev.isra.27+0x148/0x280 [coresight_etm4x]
[ 12.912056] etm4_probe+0x144/0x1c0 [coresight_etm4x]
[ 12.918998] etm4_probe_amba+0x40/0x78 [coresight_etm4x]
[ 12.924032] amba_probe+0x11c/0x1f0

After patch: similar log

[ 12.444467] cs_system_cfg: CoreSight Configuration manager initialised
[ 12.456329] coresight-cpu-debug 850000.debug: Coresight debug-CPU0 initialized
[ 12.456754] coresight-cpu-debug 852000.debug: Coresight debug-CPU1 initialized
[ 12.469672] coresight-cpu-debug 854000.debug: Coresight debug-CPU2 initialized
[ 12.476098] coresight-cpu-debug 856000.debug: Coresight debug-CPU3 initialized
[ 12.532409] coresight stm0: STM32 initialized
[ 12.533708] coresight cti_sys0: CTI initialized
[ 12.539478] coresight cti_sys1: CTI initialized
[ 12.550106] coresight cti_cpu0: CTI initialized
[ 12.633931] coresight cti_cpu1: CTI initialized
[ 12.634664] coresight cti_cpu2: CTI initialized
[ 12.638090] coresight cti_cpu3: CTI initialized
[ 12.721136] CSCFG registered etm0

...

[ 12.762643] CSCFG registered etm1
[ 12.762666] coresight etm1: CPU1: etm v4.0 initialized
[ 12.776258] CSCFG registered etm2
[ 12.776282] coresight etm2: CPU2: etm v4.0 initialized
[ 12.784357] CSCFG registered etm3
[ 12.785455] coresight etm3: CPU3: etm v4.0 initialized

Error can also be triggered by manually starting the modules using modprobe
in the following order:

root@linaro-developer:/home/linaro/cs-mods# modprobe coresight
root@linaro-developer:/home/linaro/cs-mods# modprobe coresight-cti
root@linaro-developer:/home/linaro/cs-mods# modprobe coresight-etm4x

Tested on Dragonboard DB410c
Applies to coresight/next

Fixes: 23722fb46725 ("coresight: Fix possible deadlock with lock dependency")
Signed-off-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20221123193818.6253-1-mike.leach@linaro.org
diff 3dc228b3 Wed Nov 23 12:38:18 MST 2022 Mike Leach <mike.leach@linaro.org> coresight: cti: Fix null pointer error on CTI init before ETM

When CTI is discovered first then the function
coresight_set_assoc_ectdev_mutex() is called to set the association
between CTI and ETM device. Recent lockdep fix passes a null pointer.

This patch passes the correct pointer.

Before patch: log of boot oops sequence with CTI discovered first:

[ 12.424091] cs_system_cfg: CoreSight Configuration manager initialised
[ 12.483474] coresight cti_sys0: CTI initialized
[ 12.488109] coresight cti_sys1: CTI initialized
[ 12.503594] coresight cti_cpu0: CTI initialized
[ 12.517877] coresight-cpu-debug 850000.debug: Coresight debug-CPU0 initialized
[ 12.523479] coresight-cpu-debug 852000.debug: Coresight debug-CPU1 initialized
[ 12.529926] coresight-cpu-debug 854000.debug: Coresight debug-CPU2 initialized
[ 12.541808] coresight stm0: STM32 initialized
[ 12.544421] coresight-cpu-debug 856000.debug: Coresight debug-CPU3 initialized
[ 12.585639] coresight cti_cpu1: CTI initialized
[ 12.614028] coresight cti_cpu2: CTI initialized
[ 12.631679] CSCFG registered etm0
[ 12.633920] coresight etm0: CPU0: etm v4.0 initialized
[ 12.656392] coresight cti_cpu3: CTI initialized

...

[ 12.708383] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000348

...

[ 12.755094] Internal error: Oops: 0000000096000044 [#1] SMP
[ 12.761817] Modules linked in: coresight_etm4x(+) coresight_tmc coresight_cpu_debug coresight_replicator coresight_funnel coresight_cti coresight_tpiu coresight_stm coresight
[ 12.767210] CPU: 3 PID: 1346 Comm: systemd-udevd Not tainted 6.1.0-rc3tid-v6tid-v6-235166-gf7f7d7a2204a-dirty #498
[ 12.782827] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
[ 12.793154] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 12.800010] pc : coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]
[ 12.806694] lr : coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]

...

[ 12.885064] Call trace:
[ 12.892352] coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]
[ 12.894693] cti_add_assoc_to_csdev+0x144/0x1b0 [coresight_cti]
[ 12.900943] coresight_register+0x2c8/0x320 [coresight]
[ 12.906844] etm4_add_coresight_dev.isra.27+0x148/0x280 [coresight_etm4x]
[ 12.912056] etm4_probe+0x144/0x1c0 [coresight_etm4x]
[ 12.918998] etm4_probe_amba+0x40/0x78 [coresight_etm4x]
[ 12.924032] amba_probe+0x11c/0x1f0

After patch: similar log

[ 12.444467] cs_system_cfg: CoreSight Configuration manager initialised
[ 12.456329] coresight-cpu-debug 850000.debug: Coresight debug-CPU0 initialized
[ 12.456754] coresight-cpu-debug 852000.debug: Coresight debug-CPU1 initialized
[ 12.469672] coresight-cpu-debug 854000.debug: Coresight debug-CPU2 initialized
[ 12.476098] coresight-cpu-debug 856000.debug: Coresight debug-CPU3 initialized
[ 12.532409] coresight stm0: STM32 initialized
[ 12.533708] coresight cti_sys0: CTI initialized
[ 12.539478] coresight cti_sys1: CTI initialized
[ 12.550106] coresight cti_cpu0: CTI initialized
[ 12.633931] coresight cti_cpu1: CTI initialized
[ 12.634664] coresight cti_cpu2: CTI initialized
[ 12.638090] coresight cti_cpu3: CTI initialized
[ 12.721136] CSCFG registered etm0

...

[ 12.762643] CSCFG registered etm1
[ 12.762666] coresight etm1: CPU1: etm v4.0 initialized
[ 12.776258] CSCFG registered etm2
[ 12.776282] coresight etm2: CPU2: etm v4.0 initialized
[ 12.784357] CSCFG registered etm3
[ 12.785455] coresight etm3: CPU3: etm v4.0 initialized

Error can also be triggered by manually starting the modules using modprobe
in the following order:

root@linaro-developer:/home/linaro/cs-mods# modprobe coresight
root@linaro-developer:/home/linaro/cs-mods# modprobe coresight-cti
root@linaro-developer:/home/linaro/cs-mods# modprobe coresight-etm4x

Tested on Dragonboard DB410c
Applies to coresight/next

Fixes: 23722fb46725 ("coresight: Fix possible deadlock with lock dependency")
Signed-off-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20221123193818.6253-1-mike.leach@linaro.org
diff 3dc228b3 Wed Nov 23 12:38:18 MST 2022 Mike Leach <mike.leach@linaro.org> coresight: cti: Fix null pointer error on CTI init before ETM

When CTI is discovered first then the function
coresight_set_assoc_ectdev_mutex() is called to set the association
between CTI and ETM device. Recent lockdep fix passes a null pointer.

This patch passes the correct pointer.

Before patch: log of boot oops sequence with CTI discovered first:

[ 12.424091] cs_system_cfg: CoreSight Configuration manager initialised
[ 12.483474] coresight cti_sys0: CTI initialized
[ 12.488109] coresight cti_sys1: CTI initialized
[ 12.503594] coresight cti_cpu0: CTI initialized
[ 12.517877] coresight-cpu-debug 850000.debug: Coresight debug-CPU0 initialized
[ 12.523479] coresight-cpu-debug 852000.debug: Coresight debug-CPU1 initialized
[ 12.529926] coresight-cpu-debug 854000.debug: Coresight debug-CPU2 initialized
[ 12.541808] coresight stm0: STM32 initialized
[ 12.544421] coresight-cpu-debug 856000.debug: Coresight debug-CPU3 initialized
[ 12.585639] coresight cti_cpu1: CTI initialized
[ 12.614028] coresight cti_cpu2: CTI initialized
[ 12.631679] CSCFG registered etm0
[ 12.633920] coresight etm0: CPU0: etm v4.0 initialized
[ 12.656392] coresight cti_cpu3: CTI initialized

...

[ 12.708383] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000348

...

[ 12.755094] Internal error: Oops: 0000000096000044 [#1] SMP
[ 12.761817] Modules linked in: coresight_etm4x(+) coresight_tmc coresight_cpu_debug coresight_replicator coresight_funnel coresight_cti coresight_tpiu coresight_stm coresight
[ 12.767210] CPU: 3 PID: 1346 Comm: systemd-udevd Not tainted 6.1.0-rc3tid-v6tid-v6-235166-gf7f7d7a2204a-dirty #498
[ 12.782827] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
[ 12.793154] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 12.800010] pc : coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]
[ 12.806694] lr : coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]

...

[ 12.885064] Call trace:
[ 12.892352] coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]
[ 12.894693] cti_add_assoc_to_csdev+0x144/0x1b0 [coresight_cti]
[ 12.900943] coresight_register+0x2c8/0x320 [coresight]
[ 12.906844] etm4_add_coresight_dev.isra.27+0x148/0x280 [coresight_etm4x]
[ 12.912056] etm4_probe+0x144/0x1c0 [coresight_etm4x]
[ 12.918998] etm4_probe_amba+0x40/0x78 [coresight_etm4x]
[ 12.924032] amba_probe+0x11c/0x1f0

After patch: similar log

[ 12.444467] cs_system_cfg: CoreSight Configuration manager initialised
[ 12.456329] coresight-cpu-debug 850000.debug: Coresight debug-CPU0 initialized
[ 12.456754] coresight-cpu-debug 852000.debug: Coresight debug-CPU1 initialized
[ 12.469672] coresight-cpu-debug 854000.debug: Coresight debug-CPU2 initialized
[ 12.476098] coresight-cpu-debug 856000.debug: Coresight debug-CPU3 initialized
[ 12.532409] coresight stm0: STM32 initialized
[ 12.533708] coresight cti_sys0: CTI initialized
[ 12.539478] coresight cti_sys1: CTI initialized
[ 12.550106] coresight cti_cpu0: CTI initialized
[ 12.633931] coresight cti_cpu1: CTI initialized
[ 12.634664] coresight cti_cpu2: CTI initialized
[ 12.638090] coresight cti_cpu3: CTI initialized
[ 12.721136] CSCFG registered etm0

...

[ 12.762643] CSCFG registered etm1
[ 12.762666] coresight etm1: CPU1: etm v4.0 initialized
[ 12.776258] CSCFG registered etm2
[ 12.776282] coresight etm2: CPU2: etm v4.0 initialized
[ 12.784357] CSCFG registered etm3
[ 12.785455] coresight etm3: CPU3: etm v4.0 initialized

Error can also be triggered by manually starting the modules using modprobe
in the following order:

root@linaro-developer:/home/linaro/cs-mods# modprobe coresight
root@linaro-developer:/home/linaro/cs-mods# modprobe coresight-cti
root@linaro-developer:/home/linaro/cs-mods# modprobe coresight-etm4x

Tested on Dragonboard DB410c
Applies to coresight/next

Fixes: 23722fb46725 ("coresight: Fix possible deadlock with lock dependency")
Signed-off-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20221123193818.6253-1-mike.leach@linaro.org
diff 3dc228b3 Wed Nov 23 12:38:18 MST 2022 Mike Leach <mike.leach@linaro.org> coresight: cti: Fix null pointer error on CTI init before ETM

When CTI is discovered first then the function
coresight_set_assoc_ectdev_mutex() is called to set the association
between CTI and ETM device. Recent lockdep fix passes a null pointer.

This patch passes the correct pointer.

Before patch: log of boot oops sequence with CTI discovered first:

[ 12.424091] cs_system_cfg: CoreSight Configuration manager initialised
[ 12.483474] coresight cti_sys0: CTI initialized
[ 12.488109] coresight cti_sys1: CTI initialized
[ 12.503594] coresight cti_cpu0: CTI initialized
[ 12.517877] coresight-cpu-debug 850000.debug: Coresight debug-CPU0 initialized
[ 12.523479] coresight-cpu-debug 852000.debug: Coresight debug-CPU1 initialized
[ 12.529926] coresight-cpu-debug 854000.debug: Coresight debug-CPU2 initialized
[ 12.541808] coresight stm0: STM32 initialized
[ 12.544421] coresight-cpu-debug 856000.debug: Coresight debug-CPU3 initialized
[ 12.585639] coresight cti_cpu1: CTI initialized
[ 12.614028] coresight cti_cpu2: CTI initialized
[ 12.631679] CSCFG registered etm0
[ 12.633920] coresight etm0: CPU0: etm v4.0 initialized
[ 12.656392] coresight cti_cpu3: CTI initialized

...

[ 12.708383] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000348

...

[ 12.755094] Internal error: Oops: 0000000096000044 [#1] SMP
[ 12.761817] Modules linked in: coresight_etm4x(+) coresight_tmc coresight_cpu_debug coresight_replicator coresight_funnel coresight_cti coresight_tpiu coresight_stm coresight
[ 12.767210] CPU: 3 PID: 1346 Comm: systemd-udevd Not tainted 6.1.0-rc3tid-v6tid-v6-235166-gf7f7d7a2204a-dirty #498
[ 12.782827] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
[ 12.793154] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 12.800010] pc : coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]
[ 12.806694] lr : coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]

...

[ 12.885064] Call trace:
[ 12.892352] coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]
[ 12.894693] cti_add_assoc_to_csdev+0x144/0x1b0 [coresight_cti]
[ 12.900943] coresight_register+0x2c8/0x320 [coresight]
[ 12.906844] etm4_add_coresight_dev.isra.27+0x148/0x280 [coresight_etm4x]
[ 12.912056] etm4_probe+0x144/0x1c0 [coresight_etm4x]
[ 12.918998] etm4_probe_amba+0x40/0x78 [coresight_etm4x]
[ 12.924032] amba_probe+0x11c/0x1f0

After patch: similar log

[ 12.444467] cs_system_cfg: CoreSight Configuration manager initialised
[ 12.456329] coresight-cpu-debug 850000.debug: Coresight debug-CPU0 initialized
[ 12.456754] coresight-cpu-debug 852000.debug: Coresight debug-CPU1 initialized
[ 12.469672] coresight-cpu-debug 854000.debug: Coresight debug-CPU2 initialized
[ 12.476098] coresight-cpu-debug 856000.debug: Coresight debug-CPU3 initialized
[ 12.532409] coresight stm0: STM32 initialized
[ 12.533708] coresight cti_sys0: CTI initialized
[ 12.539478] coresight cti_sys1: CTI initialized
[ 12.550106] coresight cti_cpu0: CTI initialized
[ 12.633931] coresight cti_cpu1: CTI initialized
[ 12.634664] coresight cti_cpu2: CTI initialized
[ 12.638090] coresight cti_cpu3: CTI initialized
[ 12.721136] CSCFG registered etm0

...

[ 12.762643] CSCFG registered etm1
[ 12.762666] coresight etm1: CPU1: etm v4.0 initialized
[ 12.776258] CSCFG registered etm2
[ 12.776282] coresight etm2: CPU2: etm v4.0 initialized
[ 12.784357] CSCFG registered etm3
[ 12.785455] coresight etm3: CPU3: etm v4.0 initialized

Error can also be triggered by manually starting the modules using modprobe
in the following order:

root@linaro-developer:/home/linaro/cs-mods# modprobe coresight
root@linaro-developer:/home/linaro/cs-mods# modprobe coresight-cti
root@linaro-developer:/home/linaro/cs-mods# modprobe coresight-etm4x

Tested on Dragonboard DB410c
Applies to coresight/next

Fixes: 23722fb46725 ("coresight: Fix possible deadlock with lock dependency")
Signed-off-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20221123193818.6253-1-mike.leach@linaro.org
diff 3dc228b3 Wed Nov 23 12:38:18 MST 2022 Mike Leach <mike.leach@linaro.org> coresight: cti: Fix null pointer error on CTI init before ETM

When CTI is discovered first then the function
coresight_set_assoc_ectdev_mutex() is called to set the association
between CTI and ETM device. Recent lockdep fix passes a null pointer.

This patch passes the correct pointer.

Before patch: log of boot oops sequence with CTI discovered first:

[ 12.424091] cs_system_cfg: CoreSight Configuration manager initialised
[ 12.483474] coresight cti_sys0: CTI initialized
[ 12.488109] coresight cti_sys1: CTI initialized
[ 12.503594] coresight cti_cpu0: CTI initialized
[ 12.517877] coresight-cpu-debug 850000.debug: Coresight debug-CPU0 initialized
[ 12.523479] coresight-cpu-debug 852000.debug: Coresight debug-CPU1 initialized
[ 12.529926] coresight-cpu-debug 854000.debug: Coresight debug-CPU2 initialized
[ 12.541808] coresight stm0: STM32 initialized
[ 12.544421] coresight-cpu-debug 856000.debug: Coresight debug-CPU3 initialized
[ 12.585639] coresight cti_cpu1: CTI initialized
[ 12.614028] coresight cti_cpu2: CTI initialized
[ 12.631679] CSCFG registered etm0
[ 12.633920] coresight etm0: CPU0: etm v4.0 initialized
[ 12.656392] coresight cti_cpu3: CTI initialized

...

[ 12.708383] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000348

...

[ 12.755094] Internal error: Oops: 0000000096000044 [#1] SMP
[ 12.761817] Modules linked in: coresight_etm4x(+) coresight_tmc coresight_cpu_debug coresight_replicator coresight_funnel coresight_cti coresight_tpiu coresight_stm coresight
[ 12.767210] CPU: 3 PID: 1346 Comm: systemd-udevd Not tainted 6.1.0-rc3tid-v6tid-v6-235166-gf7f7d7a2204a-dirty #498
[ 12.782827] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
[ 12.793154] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 12.800010] pc : coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]
[ 12.806694] lr : coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]

...

[ 12.885064] Call trace:
[ 12.892352] coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]
[ 12.894693] cti_add_assoc_to_csdev+0x144/0x1b0 [coresight_cti]
[ 12.900943] coresight_register+0x2c8/0x320 [coresight]
[ 12.906844] etm4_add_coresight_dev.isra.27+0x148/0x280 [coresight_etm4x]
[ 12.912056] etm4_probe+0x144/0x1c0 [coresight_etm4x]
[ 12.918998] etm4_probe_amba+0x40/0x78 [coresight_etm4x]
[ 12.924032] amba_probe+0x11c/0x1f0

After patch: similar log

[ 12.444467] cs_system_cfg: CoreSight Configuration manager initialised
[ 12.456329] coresight-cpu-debug 850000.debug: Coresight debug-CPU0 initialized
[ 12.456754] coresight-cpu-debug 852000.debug: Coresight debug-CPU1 initialized
[ 12.469672] coresight-cpu-debug 854000.debug: Coresight debug-CPU2 initialized
[ 12.476098] coresight-cpu-debug 856000.debug: Coresight debug-CPU3 initialized
[ 12.532409] coresight stm0: STM32 initialized
[ 12.533708] coresight cti_sys0: CTI initialized
[ 12.539478] coresight cti_sys1: CTI initialized
[ 12.550106] coresight cti_cpu0: CTI initialized
[ 12.633931] coresight cti_cpu1: CTI initialized
[ 12.634664] coresight cti_cpu2: CTI initialized
[ 12.638090] coresight cti_cpu3: CTI initialized
[ 12.721136] CSCFG registered etm0

...

[ 12.762643] CSCFG registered etm1
[ 12.762666] coresight etm1: CPU1: etm v4.0 initialized
[ 12.776258] CSCFG registered etm2
[ 12.776282] coresight etm2: CPU2: etm v4.0 initialized
[ 12.784357] CSCFG registered etm3
[ 12.785455] coresight etm3: CPU3: etm v4.0 initialized

Error can also be triggered by manually starting the modules using modprobe
in the following order:

root@linaro-developer:/home/linaro/cs-mods# modprobe coresight
root@linaro-developer:/home/linaro/cs-mods# modprobe coresight-cti
root@linaro-developer:/home/linaro/cs-mods# modprobe coresight-etm4x

Tested on Dragonboard DB410c
Applies to coresight/next

Fixes: 23722fb46725 ("coresight: Fix possible deadlock with lock dependency")
Signed-off-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20221123193818.6253-1-mike.leach@linaro.org
diff 3dc228b3 Wed Nov 23 12:38:18 MST 2022 Mike Leach <mike.leach@linaro.org> coresight: cti: Fix null pointer error on CTI init before ETM

When CTI is discovered first then the function
coresight_set_assoc_ectdev_mutex() is called to set the association
between CTI and ETM device. Recent lockdep fix passes a null pointer.

This patch passes the correct pointer.

Before patch: log of boot oops sequence with CTI discovered first:

[ 12.424091] cs_system_cfg: CoreSight Configuration manager initialised
[ 12.483474] coresight cti_sys0: CTI initialized
[ 12.488109] coresight cti_sys1: CTI initialized
[ 12.503594] coresight cti_cpu0: CTI initialized
[ 12.517877] coresight-cpu-debug 850000.debug: Coresight debug-CPU0 initialized
[ 12.523479] coresight-cpu-debug 852000.debug: Coresight debug-CPU1 initialized
[ 12.529926] coresight-cpu-debug 854000.debug: Coresight debug-CPU2 initialized
[ 12.541808] coresight stm0: STM32 initialized
[ 12.544421] coresight-cpu-debug 856000.debug: Coresight debug-CPU3 initialized
[ 12.585639] coresight cti_cpu1: CTI initialized
[ 12.614028] coresight cti_cpu2: CTI initialized
[ 12.631679] CSCFG registered etm0
[ 12.633920] coresight etm0: CPU0: etm v4.0 initialized
[ 12.656392] coresight cti_cpu3: CTI initialized

...

[ 12.708383] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000348

...

[ 12.755094] Internal error: Oops: 0000000096000044 [#1] SMP
[ 12.761817] Modules linked in: coresight_etm4x(+) coresight_tmc coresight_cpu_debug coresight_replicator coresight_funnel coresight_cti coresight_tpiu coresight_stm coresight
[ 12.767210] CPU: 3 PID: 1346 Comm: systemd-udevd Not tainted 6.1.0-rc3tid-v6tid-v6-235166-gf7f7d7a2204a-dirty #498
[ 12.782827] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
[ 12.793154] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 12.800010] pc : coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]
[ 12.806694] lr : coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]

...

[ 12.885064] Call trace:
[ 12.892352] coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]
[ 12.894693] cti_add_assoc_to_csdev+0x144/0x1b0 [coresight_cti]
[ 12.900943] coresight_register+0x2c8/0x320 [coresight]
[ 12.906844] etm4_add_coresight_dev.isra.27+0x148/0x280 [coresight_etm4x]
[ 12.912056] etm4_probe+0x144/0x1c0 [coresight_etm4x]
[ 12.918998] etm4_probe_amba+0x40/0x78 [coresight_etm4x]
[ 12.924032] amba_probe+0x11c/0x1f0

After patch: similar log

[ 12.444467] cs_system_cfg: CoreSight Configuration manager initialised
[ 12.456329] coresight-cpu-debug 850000.debug: Coresight debug-CPU0 initialized
[ 12.456754] coresight-cpu-debug 852000.debug: Coresight debug-CPU1 initialized
[ 12.469672] coresight-cpu-debug 854000.debug: Coresight debug-CPU2 initialized
[ 12.476098] coresight-cpu-debug 856000.debug: Coresight debug-CPU3 initialized
[ 12.532409] coresight stm0: STM32 initialized
[ 12.533708] coresight cti_sys0: CTI initialized
[ 12.539478] coresight cti_sys1: CTI initialized
[ 12.550106] coresight cti_cpu0: CTI initialized
[ 12.633931] coresight cti_cpu1: CTI initialized
[ 12.634664] coresight cti_cpu2: CTI initialized
[ 12.638090] coresight cti_cpu3: CTI initialized
[ 12.721136] CSCFG registered etm0

...

[ 12.762643] CSCFG registered etm1
[ 12.762666] coresight etm1: CPU1: etm v4.0 initialized
[ 12.776258] CSCFG registered etm2
[ 12.776282] coresight etm2: CPU2: etm v4.0 initialized
[ 12.784357] CSCFG registered etm3
[ 12.785455] coresight etm3: CPU3: etm v4.0 initialized

Error can also be triggered by manually starting the modules using modprobe
in the following order:

root@linaro-developer:/home/linaro/cs-mods# modprobe coresight
root@linaro-developer:/home/linaro/cs-mods# modprobe coresight-cti
root@linaro-developer:/home/linaro/cs-mods# modprobe coresight-etm4x

Tested on Dragonboard DB410c
Applies to coresight/next

Fixes: 23722fb46725 ("coresight: Fix possible deadlock with lock dependency")
Signed-off-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20221123193818.6253-1-mike.leach@linaro.org
diff 3dc228b3 Wed Nov 23 12:38:18 MST 2022 Mike Leach <mike.leach@linaro.org> coresight: cti: Fix null pointer error on CTI init before ETM

When CTI is discovered first then the function
coresight_set_assoc_ectdev_mutex() is called to set the association
between CTI and ETM device. Recent lockdep fix passes a null pointer.

This patch passes the correct pointer.

Before patch: log of boot oops sequence with CTI discovered first:

[ 12.424091] cs_system_cfg: CoreSight Configuration manager initialised
[ 12.483474] coresight cti_sys0: CTI initialized
[ 12.488109] coresight cti_sys1: CTI initialized
[ 12.503594] coresight cti_cpu0: CTI initialized
[ 12.517877] coresight-cpu-debug 850000.debug: Coresight debug-CPU0 initialized
[ 12.523479] coresight-cpu-debug 852000.debug: Coresight debug-CPU1 initialized
[ 12.529926] coresight-cpu-debug 854000.debug: Coresight debug-CPU2 initialized
[ 12.541808] coresight stm0: STM32 initialized
[ 12.544421] coresight-cpu-debug 856000.debug: Coresight debug-CPU3 initialized
[ 12.585639] coresight cti_cpu1: CTI initialized
[ 12.614028] coresight cti_cpu2: CTI initialized
[ 12.631679] CSCFG registered etm0
[ 12.633920] coresight etm0: CPU0: etm v4.0 initialized
[ 12.656392] coresight cti_cpu3: CTI initialized

...

[ 12.708383] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000348

...

[ 12.755094] Internal error: Oops: 0000000096000044 [#1] SMP
[ 12.761817] Modules linked in: coresight_etm4x(+) coresight_tmc coresight_cpu_debug coresight_replicator coresight_funnel coresight_cti coresight_tpiu coresight_stm coresight
[ 12.767210] CPU: 3 PID: 1346 Comm: systemd-udevd Not tainted 6.1.0-rc3tid-v6tid-v6-235166-gf7f7d7a2204a-dirty #498
[ 12.782827] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
[ 12.793154] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 12.800010] pc : coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]
[ 12.806694] lr : coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]

...

[ 12.885064] Call trace:
[ 12.892352] coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]
[ 12.894693] cti_add_assoc_to_csdev+0x144/0x1b0 [coresight_cti]
[ 12.900943] coresight_register+0x2c8/0x320 [coresight]
[ 12.906844] etm4_add_coresight_dev.isra.27+0x148/0x280 [coresight_etm4x]
[ 12.912056] etm4_probe+0x144/0x1c0 [coresight_etm4x]
[ 12.918998] etm4_probe_amba+0x40/0x78 [coresight_etm4x]
[ 12.924032] amba_probe+0x11c/0x1f0

After patch: similar log

[ 12.444467] cs_system_cfg: CoreSight Configuration manager initialised
[ 12.456329] coresight-cpu-debug 850000.debug: Coresight debug-CPU0 initialized
[ 12.456754] coresight-cpu-debug 852000.debug: Coresight debug-CPU1 initialized
[ 12.469672] coresight-cpu-debug 854000.debug: Coresight debug-CPU2 initialized
[ 12.476098] coresight-cpu-debug 856000.debug: Coresight debug-CPU3 initialized
[ 12.532409] coresight stm0: STM32 initialized
[ 12.533708] coresight cti_sys0: CTI initialized
[ 12.539478] coresight cti_sys1: CTI initialized
[ 12.550106] coresight cti_cpu0: CTI initialized
[ 12.633931] coresight cti_cpu1: CTI initialized
[ 12.634664] coresight cti_cpu2: CTI initialized
[ 12.638090] coresight cti_cpu3: CTI initialized
[ 12.721136] CSCFG registered etm0

...

[ 12.762643] CSCFG registered etm1
[ 12.762666] coresight etm1: CPU1: etm v4.0 initialized
[ 12.776258] CSCFG registered etm2
[ 12.776282] coresight etm2: CPU2: etm v4.0 initialized
[ 12.784357] CSCFG registered etm3
[ 12.785455] coresight etm3: CPU3: etm v4.0 initialized

Error can also be triggered by manually starting the modules using modprobe
in the following order:

root@linaro-developer:/home/linaro/cs-mods# modprobe coresight
root@linaro-developer:/home/linaro/cs-mods# modprobe coresight-cti
root@linaro-developer:/home/linaro/cs-mods# modprobe coresight-etm4x

Tested on Dragonboard DB410c
Applies to coresight/next

Fixes: 23722fb46725 ("coresight: Fix possible deadlock with lock dependency")
Signed-off-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20221123193818.6253-1-mike.leach@linaro.org
diff 3dc228b3 Wed Nov 23 12:38:18 MST 2022 Mike Leach <mike.leach@linaro.org> coresight: cti: Fix null pointer error on CTI init before ETM

When CTI is discovered first then the function
coresight_set_assoc_ectdev_mutex() is called to set the association
between CTI and ETM device. Recent lockdep fix passes a null pointer.

This patch passes the correct pointer.

Before patch: log of boot oops sequence with CTI discovered first:

[ 12.424091] cs_system_cfg: CoreSight Configuration manager initialised
[ 12.483474] coresight cti_sys0: CTI initialized
[ 12.488109] coresight cti_sys1: CTI initialized
[ 12.503594] coresight cti_cpu0: CTI initialized
[ 12.517877] coresight-cpu-debug 850000.debug: Coresight debug-CPU0 initialized
[ 12.523479] coresight-cpu-debug 852000.debug: Coresight debug-CPU1 initialized
[ 12.529926] coresight-cpu-debug 854000.debug: Coresight debug-CPU2 initialized
[ 12.541808] coresight stm0: STM32 initialized
[ 12.544421] coresight-cpu-debug 856000.debug: Coresight debug-CPU3 initialized
[ 12.585639] coresight cti_cpu1: CTI initialized
[ 12.614028] coresight cti_cpu2: CTI initialized
[ 12.631679] CSCFG registered etm0
[ 12.633920] coresight etm0: CPU0: etm v4.0 initialized
[ 12.656392] coresight cti_cpu3: CTI initialized

...

[ 12.708383] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000348

...

[ 12.755094] Internal error: Oops: 0000000096000044 [#1] SMP
[ 12.761817] Modules linked in: coresight_etm4x(+) coresight_tmc coresight_cpu_debug coresight_replicator coresight_funnel coresight_cti coresight_tpiu coresight_stm coresight
[ 12.767210] CPU: 3 PID: 1346 Comm: systemd-udevd Not tainted 6.1.0-rc3tid-v6tid-v6-235166-gf7f7d7a2204a-dirty #498
[ 12.782827] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
[ 12.793154] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 12.800010] pc : coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]
[ 12.806694] lr : coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]

...

[ 12.885064] Call trace:
[ 12.892352] coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]
[ 12.894693] cti_add_assoc_to_csdev+0x144/0x1b0 [coresight_cti]
[ 12.900943] coresight_register+0x2c8/0x320 [coresight]
[ 12.906844] etm4_add_coresight_dev.isra.27+0x148/0x280 [coresight_etm4x]
[ 12.912056] etm4_probe+0x144/0x1c0 [coresight_etm4x]
[ 12.918998] etm4_probe_amba+0x40/0x78 [coresight_etm4x]
[ 12.924032] amba_probe+0x11c/0x1f0

After patch: similar log

[ 12.444467] cs_system_cfg: CoreSight Configuration manager initialised
[ 12.456329] coresight-cpu-debug 850000.debug: Coresight debug-CPU0 initialized
[ 12.456754] coresight-cpu-debug 852000.debug: Coresight debug-CPU1 initialized
[ 12.469672] coresight-cpu-debug 854000.debug: Coresight debug-CPU2 initialized
[ 12.476098] coresight-cpu-debug 856000.debug: Coresight debug-CPU3 initialized
[ 12.532409] coresight stm0: STM32 initialized
[ 12.533708] coresight cti_sys0: CTI initialized
[ 12.539478] coresight cti_sys1: CTI initialized
[ 12.550106] coresight cti_cpu0: CTI initialized
[ 12.633931] coresight cti_cpu1: CTI initialized
[ 12.634664] coresight cti_cpu2: CTI initialized
[ 12.638090] coresight cti_cpu3: CTI initialized
[ 12.721136] CSCFG registered etm0

...

[ 12.762643] CSCFG registered etm1
[ 12.762666] coresight etm1: CPU1: etm v4.0 initialized
[ 12.776258] CSCFG registered etm2
[ 12.776282] coresight etm2: CPU2: etm v4.0 initialized
[ 12.784357] CSCFG registered etm3
[ 12.785455] coresight etm3: CPU3: etm v4.0 initialized

Error can also be triggered by manually starting the modules using modprobe
in the following order:

root@linaro-developer:/home/linaro/cs-mods# modprobe coresight
root@linaro-developer:/home/linaro/cs-mods# modprobe coresight-cti
root@linaro-developer:/home/linaro/cs-mods# modprobe coresight-etm4x

Tested on Dragonboard DB410c
Applies to coresight/next

Fixes: 23722fb46725 ("coresight: Fix possible deadlock with lock dependency")
Signed-off-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20221123193818.6253-1-mike.leach@linaro.org
diff 3dc228b3 Wed Nov 23 12:38:18 MST 2022 Mike Leach <mike.leach@linaro.org> coresight: cti: Fix null pointer error on CTI init before ETM

When CTI is discovered first then the function
coresight_set_assoc_ectdev_mutex() is called to set the association
between CTI and ETM device. Recent lockdep fix passes a null pointer.

This patch passes the correct pointer.

Before patch: log of boot oops sequence with CTI discovered first:

[ 12.424091] cs_system_cfg: CoreSight Configuration manager initialised
[ 12.483474] coresight cti_sys0: CTI initialized
[ 12.488109] coresight cti_sys1: CTI initialized
[ 12.503594] coresight cti_cpu0: CTI initialized
[ 12.517877] coresight-cpu-debug 850000.debug: Coresight debug-CPU0 initialized
[ 12.523479] coresight-cpu-debug 852000.debug: Coresight debug-CPU1 initialized
[ 12.529926] coresight-cpu-debug 854000.debug: Coresight debug-CPU2 initialized
[ 12.541808] coresight stm0: STM32 initialized
[ 12.544421] coresight-cpu-debug 856000.debug: Coresight debug-CPU3 initialized
[ 12.585639] coresight cti_cpu1: CTI initialized
[ 12.614028] coresight cti_cpu2: CTI initialized
[ 12.631679] CSCFG registered etm0
[ 12.633920] coresight etm0: CPU0: etm v4.0 initialized
[ 12.656392] coresight cti_cpu3: CTI initialized

...

[ 12.708383] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000348

...

[ 12.755094] Internal error: Oops: 0000000096000044 [#1] SMP
[ 12.761817] Modules linked in: coresight_etm4x(+) coresight_tmc coresight_cpu_debug coresight_replicator coresight_funnel coresight_cti coresight_tpiu coresight_stm coresight
[ 12.767210] CPU: 3 PID: 1346 Comm: systemd-udevd Not tainted 6.1.0-rc3tid-v6tid-v6-235166-gf7f7d7a2204a-dirty #498
[ 12.782827] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
[ 12.793154] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 12.800010] pc : coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]
[ 12.806694] lr : coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]

...

[ 12.885064] Call trace:
[ 12.892352] coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]
[ 12.894693] cti_add_assoc_to_csdev+0x144/0x1b0 [coresight_cti]
[ 12.900943] coresight_register+0x2c8/0x320 [coresight]
[ 12.906844] etm4_add_coresight_dev.isra.27+0x148/0x280 [coresight_etm4x]
[ 12.912056] etm4_probe+0x144/0x1c0 [coresight_etm4x]
[ 12.918998] etm4_probe_amba+0x40/0x78 [coresight_etm4x]
[ 12.924032] amba_probe+0x11c/0x1f0

After patch: similar log

[ 12.444467] cs_system_cfg: CoreSight Configuration manager initialised
[ 12.456329] coresight-cpu-debug 850000.debug: Coresight debug-CPU0 initialized
[ 12.456754] coresight-cpu-debug 852000.debug: Coresight debug-CPU1 initialized
[ 12.469672] coresight-cpu-debug 854000.debug: Coresight debug-CPU2 initialized
[ 12.476098] coresight-cpu-debug 856000.debug: Coresight debug-CPU3 initialized
[ 12.532409] coresight stm0: STM32 initialized
[ 12.533708] coresight cti_sys0: CTI initialized
[ 12.539478] coresight cti_sys1: CTI initialized
[ 12.550106] coresight cti_cpu0: CTI initialized
[ 12.633931] coresight cti_cpu1: CTI initialized
[ 12.634664] coresight cti_cpu2: CTI initialized
[ 12.638090] coresight cti_cpu3: CTI initialized
[ 12.721136] CSCFG registered etm0

...

[ 12.762643] CSCFG registered etm1
[ 12.762666] coresight etm1: CPU1: etm v4.0 initialized
[ 12.776258] CSCFG registered etm2
[ 12.776282] coresight etm2: CPU2: etm v4.0 initialized
[ 12.784357] CSCFG registered etm3
[ 12.785455] coresight etm3: CPU3: etm v4.0 initialized

Error can also be triggered by manually starting the modules using modprobe
in the following order:

root@linaro-developer:/home/linaro/cs-mods# modprobe coresight
root@linaro-developer:/home/linaro/cs-mods# modprobe coresight-cti
root@linaro-developer:/home/linaro/cs-mods# modprobe coresight-etm4x

Tested on Dragonboard DB410c
Applies to coresight/next

Fixes: 23722fb46725 ("coresight: Fix possible deadlock with lock dependency")
Signed-off-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20221123193818.6253-1-mike.leach@linaro.org
diff 3dc228b3 Wed Nov 23 12:38:18 MST 2022 Mike Leach <mike.leach@linaro.org> coresight: cti: Fix null pointer error on CTI init before ETM

When CTI is discovered first then the function
coresight_set_assoc_ectdev_mutex() is called to set the association
between CTI and ETM device. Recent lockdep fix passes a null pointer.

This patch passes the correct pointer.

Before patch: log of boot oops sequence with CTI discovered first:

[ 12.424091] cs_system_cfg: CoreSight Configuration manager initialised
[ 12.483474] coresight cti_sys0: CTI initialized
[ 12.488109] coresight cti_sys1: CTI initialized
[ 12.503594] coresight cti_cpu0: CTI initialized
[ 12.517877] coresight-cpu-debug 850000.debug: Coresight debug-CPU0 initialized
[ 12.523479] coresight-cpu-debug 852000.debug: Coresight debug-CPU1 initialized
[ 12.529926] coresight-cpu-debug 854000.debug: Coresight debug-CPU2 initialized
[ 12.541808] coresight stm0: STM32 initialized
[ 12.544421] coresight-cpu-debug 856000.debug: Coresight debug-CPU3 initialized
[ 12.585639] coresight cti_cpu1: CTI initialized
[ 12.614028] coresight cti_cpu2: CTI initialized
[ 12.631679] CSCFG registered etm0
[ 12.633920] coresight etm0: CPU0: etm v4.0 initialized
[ 12.656392] coresight cti_cpu3: CTI initialized

...

[ 12.708383] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000348

...

[ 12.755094] Internal error: Oops: 0000000096000044 [#1] SMP
[ 12.761817] Modules linked in: coresight_etm4x(+) coresight_tmc coresight_cpu_debug coresight_replicator coresight_funnel coresight_cti coresight_tpiu coresight_stm coresight
[ 12.767210] CPU: 3 PID: 1346 Comm: systemd-udevd Not tainted 6.1.0-rc3tid-v6tid-v6-235166-gf7f7d7a2204a-dirty #498
[ 12.782827] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
[ 12.793154] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 12.800010] pc : coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]
[ 12.806694] lr : coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]

...

[ 12.885064] Call trace:
[ 12.892352] coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]
[ 12.894693] cti_add_assoc_to_csdev+0x144/0x1b0 [coresight_cti]
[ 12.900943] coresight_register+0x2c8/0x320 [coresight]
[ 12.906844] etm4_add_coresight_dev.isra.27+0x148/0x280 [coresight_etm4x]
[ 12.912056] etm4_probe+0x144/0x1c0 [coresight_etm4x]
[ 12.918998] etm4_probe_amba+0x40/0x78 [coresight_etm4x]
[ 12.924032] amba_probe+0x11c/0x1f0

After patch: similar log

[ 12.444467] cs_system_cfg: CoreSight Configuration manager initialised
[ 12.456329] coresight-cpu-debug 850000.debug: Coresight debug-CPU0 initialized
[ 12.456754] coresight-cpu-debug 852000.debug: Coresight debug-CPU1 initialized
[ 12.469672] coresight-cpu-debug 854000.debug: Coresight debug-CPU2 initialized
[ 12.476098] coresight-cpu-debug 856000.debug: Coresight debug-CPU3 initialized
[ 12.532409] coresight stm0: STM32 initialized
[ 12.533708] coresight cti_sys0: CTI initialized
[ 12.539478] coresight cti_sys1: CTI initialized
[ 12.550106] coresight cti_cpu0: CTI initialized
[ 12.633931] coresight cti_cpu1: CTI initialized
[ 12.634664] coresight cti_cpu2: CTI initialized
[ 12.638090] coresight cti_cpu3: CTI initialized
[ 12.721136] CSCFG registered etm0

...

[ 12.762643] CSCFG registered etm1
[ 12.762666] coresight etm1: CPU1: etm v4.0 initialized
[ 12.776258] CSCFG registered etm2
[ 12.776282] coresight etm2: CPU2: etm v4.0 initialized
[ 12.784357] CSCFG registered etm3
[ 12.785455] coresight etm3: CPU3: etm v4.0 initialized

Error can also be triggered by manually starting the modules using modprobe
in the following order:

root@linaro-developer:/home/linaro/cs-mods# modprobe coresight
root@linaro-developer:/home/linaro/cs-mods# modprobe coresight-cti
root@linaro-developer:/home/linaro/cs-mods# modprobe coresight-etm4x

Tested on Dragonboard DB410c
Applies to coresight/next

Fixes: 23722fb46725 ("coresight: Fix possible deadlock with lock dependency")
Signed-off-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20221123193818.6253-1-mike.leach@linaro.org
diff 3dc228b3 Wed Nov 23 12:38:18 MST 2022 Mike Leach <mike.leach@linaro.org> coresight: cti: Fix null pointer error on CTI init before ETM

When CTI is discovered first then the function
coresight_set_assoc_ectdev_mutex() is called to set the association
between CTI and ETM device. Recent lockdep fix passes a null pointer.

This patch passes the correct pointer.

Before patch: log of boot oops sequence with CTI discovered first:

[ 12.424091] cs_system_cfg: CoreSight Configuration manager initialised
[ 12.483474] coresight cti_sys0: CTI initialized
[ 12.488109] coresight cti_sys1: CTI initialized
[ 12.503594] coresight cti_cpu0: CTI initialized
[ 12.517877] coresight-cpu-debug 850000.debug: Coresight debug-CPU0 initialized
[ 12.523479] coresight-cpu-debug 852000.debug: Coresight debug-CPU1 initialized
[ 12.529926] coresight-cpu-debug 854000.debug: Coresight debug-CPU2 initialized
[ 12.541808] coresight stm0: STM32 initialized
[ 12.544421] coresight-cpu-debug 856000.debug: Coresight debug-CPU3 initialized
[ 12.585639] coresight cti_cpu1: CTI initialized
[ 12.614028] coresight cti_cpu2: CTI initialized
[ 12.631679] CSCFG registered etm0
[ 12.633920] coresight etm0: CPU0: etm v4.0 initialized
[ 12.656392] coresight cti_cpu3: CTI initialized

...

[ 12.708383] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000348

...

[ 12.755094] Internal error: Oops: 0000000096000044 [#1] SMP
[ 12.761817] Modules linked in: coresight_etm4x(+) coresight_tmc coresight_cpu_debug coresight_replicator coresight_funnel coresight_cti coresight_tpiu coresight_stm coresight
[ 12.767210] CPU: 3 PID: 1346 Comm: systemd-udevd Not tainted 6.1.0-rc3tid-v6tid-v6-235166-gf7f7d7a2204a-dirty #498
[ 12.782827] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
[ 12.793154] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 12.800010] pc : coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]
[ 12.806694] lr : coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]

...

[ 12.885064] Call trace:
[ 12.892352] coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]
[ 12.894693] cti_add_assoc_to_csdev+0x144/0x1b0 [coresight_cti]
[ 12.900943] coresight_register+0x2c8/0x320 [coresight]
[ 12.906844] etm4_add_coresight_dev.isra.27+0x148/0x280 [coresight_etm4x]
[ 12.912056] etm4_probe+0x144/0x1c0 [coresight_etm4x]
[ 12.918998] etm4_probe_amba+0x40/0x78 [coresight_etm4x]
[ 12.924032] amba_probe+0x11c/0x1f0

After patch: similar log

[ 12.444467] cs_system_cfg: CoreSight Configuration manager initialised
[ 12.456329] coresight-cpu-debug 850000.debug: Coresight debug-CPU0 initialized
[ 12.456754] coresight-cpu-debug 852000.debug: Coresight debug-CPU1 initialized
[ 12.469672] coresight-cpu-debug 854000.debug: Coresight debug-CPU2 initialized
[ 12.476098] coresight-cpu-debug 856000.debug: Coresight debug-CPU3 initialized
[ 12.532409] coresight stm0: STM32 initialized
[ 12.533708] coresight cti_sys0: CTI initialized
[ 12.539478] coresight cti_sys1: CTI initialized
[ 12.550106] coresight cti_cpu0: CTI initialized
[ 12.633931] coresight cti_cpu1: CTI initialized
[ 12.634664] coresight cti_cpu2: CTI initialized
[ 12.638090] coresight cti_cpu3: CTI initialized
[ 12.721136] CSCFG registered etm0

...

[ 12.762643] CSCFG registered etm1
[ 12.762666] coresight etm1: CPU1: etm v4.0 initialized
[ 12.776258] CSCFG registered etm2
[ 12.776282] coresight etm2: CPU2: etm v4.0 initialized
[ 12.784357] CSCFG registered etm3
[ 12.785455] coresight etm3: CPU3: etm v4.0 initialized

Error can also be triggered by manually starting the modules using modprobe
in the following order:

root@linaro-developer:/home/linaro/cs-mods# modprobe coresight
root@linaro-developer:/home/linaro/cs-mods# modprobe coresight-cti
root@linaro-developer:/home/linaro/cs-mods# modprobe coresight-etm4x

Tested on Dragonboard DB410c
Applies to coresight/next

Fixes: 23722fb46725 ("coresight: Fix possible deadlock with lock dependency")
Signed-off-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20221123193818.6253-1-mike.leach@linaro.org
diff 3dc228b3 Wed Nov 23 12:38:18 MST 2022 Mike Leach <mike.leach@linaro.org> coresight: cti: Fix null pointer error on CTI init before ETM

When CTI is discovered first then the function
coresight_set_assoc_ectdev_mutex() is called to set the association
between CTI and ETM device. Recent lockdep fix passes a null pointer.

This patch passes the correct pointer.

Before patch: log of boot oops sequence with CTI discovered first:

[ 12.424091] cs_system_cfg: CoreSight Configuration manager initialised
[ 12.483474] coresight cti_sys0: CTI initialized
[ 12.488109] coresight cti_sys1: CTI initialized
[ 12.503594] coresight cti_cpu0: CTI initialized
[ 12.517877] coresight-cpu-debug 850000.debug: Coresight debug-CPU0 initialized
[ 12.523479] coresight-cpu-debug 852000.debug: Coresight debug-CPU1 initialized
[ 12.529926] coresight-cpu-debug 854000.debug: Coresight debug-CPU2 initialized
[ 12.541808] coresight stm0: STM32 initialized
[ 12.544421] coresight-cpu-debug 856000.debug: Coresight debug-CPU3 initialized
[ 12.585639] coresight cti_cpu1: CTI initialized
[ 12.614028] coresight cti_cpu2: CTI initialized
[ 12.631679] CSCFG registered etm0
[ 12.633920] coresight etm0: CPU0: etm v4.0 initialized
[ 12.656392] coresight cti_cpu3: CTI initialized

...

[ 12.708383] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000348

...

[ 12.755094] Internal error: Oops: 0000000096000044 [#1] SMP
[ 12.761817] Modules linked in: coresight_etm4x(+) coresight_tmc coresight_cpu_debug coresight_replicator coresight_funnel coresight_cti coresight_tpiu coresight_stm coresight
[ 12.767210] CPU: 3 PID: 1346 Comm: systemd-udevd Not tainted 6.1.0-rc3tid-v6tid-v6-235166-gf7f7d7a2204a-dirty #498
[ 12.782827] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
[ 12.793154] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 12.800010] pc : coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]
[ 12.806694] lr : coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]

...

[ 12.885064] Call trace:
[ 12.892352] coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]
[ 12.894693] cti_add_assoc_to_csdev+0x144/0x1b0 [coresight_cti]
[ 12.900943] coresight_register+0x2c8/0x320 [coresight]
[ 12.906844] etm4_add_coresight_dev.isra.27+0x148/0x280 [coresight_etm4x]
[ 12.912056] etm4_probe+0x144/0x1c0 [coresight_etm4x]
[ 12.918998] etm4_probe_amba+0x40/0x78 [coresight_etm4x]
[ 12.924032] amba_probe+0x11c/0x1f0

After patch: similar log

[ 12.444467] cs_system_cfg: CoreSight Configuration manager initialised
[ 12.456329] coresight-cpu-debug 850000.debug: Coresight debug-CPU0 initialized
[ 12.456754] coresight-cpu-debug 852000.debug: Coresight debug-CPU1 initialized
[ 12.469672] coresight-cpu-debug 854000.debug: Coresight debug-CPU2 initialized
[ 12.476098] coresight-cpu-debug 856000.debug: Coresight debug-CPU3 initialized
[ 12.532409] coresight stm0: STM32 initialized
[ 12.533708] coresight cti_sys0: CTI initialized
[ 12.539478] coresight cti_sys1: CTI initialized
[ 12.550106] coresight cti_cpu0: CTI initialized
[ 12.633931] coresight cti_cpu1: CTI initialized
[ 12.634664] coresight cti_cpu2: CTI initialized
[ 12.638090] coresight cti_cpu3: CTI initialized
[ 12.721136] CSCFG registered etm0

...

[ 12.762643] CSCFG registered etm1
[ 12.762666] coresight etm1: CPU1: etm v4.0 initialized
[ 12.776258] CSCFG registered etm2
[ 12.776282] coresight etm2: CPU2: etm v4.0 initialized
[ 12.784357] CSCFG registered etm3
[ 12.785455] coresight etm3: CPU3: etm v4.0 initialized

Error can also be triggered by manually starting the modules using modprobe
in the following order:

root@linaro-developer:/home/linaro/cs-mods# modprobe coresight
root@linaro-developer:/home/linaro/cs-mods# modprobe coresight-cti
root@linaro-developer:/home/linaro/cs-mods# modprobe coresight-etm4x

Tested on Dragonboard DB410c
Applies to coresight/next

Fixes: 23722fb46725 ("coresight: Fix possible deadlock with lock dependency")
Signed-off-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20221123193818.6253-1-mike.leach@linaro.org
diff 3dc228b3 Wed Nov 23 12:38:18 MST 2022 Mike Leach <mike.leach@linaro.org> coresight: cti: Fix null pointer error on CTI init before ETM

When CTI is discovered first then the function
coresight_set_assoc_ectdev_mutex() is called to set the association
between CTI and ETM device. Recent lockdep fix passes a null pointer.

This patch passes the correct pointer.

Before patch: log of boot oops sequence with CTI discovered first:

[ 12.424091] cs_system_cfg: CoreSight Configuration manager initialised
[ 12.483474] coresight cti_sys0: CTI initialized
[ 12.488109] coresight cti_sys1: CTI initialized
[ 12.503594] coresight cti_cpu0: CTI initialized
[ 12.517877] coresight-cpu-debug 850000.debug: Coresight debug-CPU0 initialized
[ 12.523479] coresight-cpu-debug 852000.debug: Coresight debug-CPU1 initialized
[ 12.529926] coresight-cpu-debug 854000.debug: Coresight debug-CPU2 initialized
[ 12.541808] coresight stm0: STM32 initialized
[ 12.544421] coresight-cpu-debug 856000.debug: Coresight debug-CPU3 initialized
[ 12.585639] coresight cti_cpu1: CTI initialized
[ 12.614028] coresight cti_cpu2: CTI initialized
[ 12.631679] CSCFG registered etm0
[ 12.633920] coresight etm0: CPU0: etm v4.0 initialized
[ 12.656392] coresight cti_cpu3: CTI initialized

...

[ 12.708383] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000348

...

[ 12.755094] Internal error: Oops: 0000000096000044 [#1] SMP
[ 12.761817] Modules linked in: coresight_etm4x(+) coresight_tmc coresight_cpu_debug coresight_replicator coresight_funnel coresight_cti coresight_tpiu coresight_stm coresight
[ 12.767210] CPU: 3 PID: 1346 Comm: systemd-udevd Not tainted 6.1.0-rc3tid-v6tid-v6-235166-gf7f7d7a2204a-dirty #498
[ 12.782827] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
[ 12.793154] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 12.800010] pc : coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]
[ 12.806694] lr : coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]

...

[ 12.885064] Call trace:
[ 12.892352] coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]
[ 12.894693] cti_add_assoc_to_csdev+0x144/0x1b0 [coresight_cti]
[ 12.900943] coresight_register+0x2c8/0x320 [coresight]
[ 12.906844] etm4_add_coresight_dev.isra.27+0x148/0x280 [coresight_etm4x]
[ 12.912056] etm4_probe+0x144/0x1c0 [coresight_etm4x]
[ 12.918998] etm4_probe_amba+0x40/0x78 [coresight_etm4x]
[ 12.924032] amba_probe+0x11c/0x1f0

After patch: similar log

[ 12.444467] cs_system_cfg: CoreSight Configuration manager initialised
[ 12.456329] coresight-cpu-debug 850000.debug: Coresight debug-CPU0 initialized
[ 12.456754] coresight-cpu-debug 852000.debug: Coresight debug-CPU1 initialized
[ 12.469672] coresight-cpu-debug 854000.debug: Coresight debug-CPU2 initialized
[ 12.476098] coresight-cpu-debug 856000.debug: Coresight debug-CPU3 initialized
[ 12.532409] coresight stm0: STM32 initialized
[ 12.533708] coresight cti_sys0: CTI initialized
[ 12.539478] coresight cti_sys1: CTI initialized
[ 12.550106] coresight cti_cpu0: CTI initialized
[ 12.633931] coresight cti_cpu1: CTI initialized
[ 12.634664] coresight cti_cpu2: CTI initialized
[ 12.638090] coresight cti_cpu3: CTI initialized
[ 12.721136] CSCFG registered etm0

...

[ 12.762643] CSCFG registered etm1
[ 12.762666] coresight etm1: CPU1: etm v4.0 initialized
[ 12.776258] CSCFG registered etm2
[ 12.776282] coresight etm2: CPU2: etm v4.0 initialized
[ 12.784357] CSCFG registered etm3
[ 12.785455] coresight etm3: CPU3: etm v4.0 initialized

Error can also be triggered by manually starting the modules using modprobe
in the following order:

root@linaro-developer:/home/linaro/cs-mods# modprobe coresight
root@linaro-developer:/home/linaro/cs-mods# modprobe coresight-cti
root@linaro-developer:/home/linaro/cs-mods# modprobe coresight-etm4x

Tested on Dragonboard DB410c
Applies to coresight/next

Fixes: 23722fb46725 ("coresight: Fix possible deadlock with lock dependency")
Signed-off-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20221123193818.6253-1-mike.leach@linaro.org
diff 3dc228b3 Wed Nov 23 12:38:18 MST 2022 Mike Leach <mike.leach@linaro.org> coresight: cti: Fix null pointer error on CTI init before ETM

When CTI is discovered first then the function
coresight_set_assoc_ectdev_mutex() is called to set the association
between CTI and ETM device. Recent lockdep fix passes a null pointer.

This patch passes the correct pointer.

Before patch: log of boot oops sequence with CTI discovered first:

[ 12.424091] cs_system_cfg: CoreSight Configuration manager initialised
[ 12.483474] coresight cti_sys0: CTI initialized
[ 12.488109] coresight cti_sys1: CTI initialized
[ 12.503594] coresight cti_cpu0: CTI initialized
[ 12.517877] coresight-cpu-debug 850000.debug: Coresight debug-CPU0 initialized
[ 12.523479] coresight-cpu-debug 852000.debug: Coresight debug-CPU1 initialized
[ 12.529926] coresight-cpu-debug 854000.debug: Coresight debug-CPU2 initialized
[ 12.541808] coresight stm0: STM32 initialized
[ 12.544421] coresight-cpu-debug 856000.debug: Coresight debug-CPU3 initialized
[ 12.585639] coresight cti_cpu1: CTI initialized
[ 12.614028] coresight cti_cpu2: CTI initialized
[ 12.631679] CSCFG registered etm0
[ 12.633920] coresight etm0: CPU0: etm v4.0 initialized
[ 12.656392] coresight cti_cpu3: CTI initialized

...

[ 12.708383] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000348

...

[ 12.755094] Internal error: Oops: 0000000096000044 [#1] SMP
[ 12.761817] Modules linked in: coresight_etm4x(+) coresight_tmc coresight_cpu_debug coresight_replicator coresight_funnel coresight_cti coresight_tpiu coresight_stm coresight
[ 12.767210] CPU: 3 PID: 1346 Comm: systemd-udevd Not tainted 6.1.0-rc3tid-v6tid-v6-235166-gf7f7d7a2204a-dirty #498
[ 12.782827] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
[ 12.793154] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 12.800010] pc : coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]
[ 12.806694] lr : coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]

...

[ 12.885064] Call trace:
[ 12.892352] coresight_set_assoc_ectdev_mutex+0x30/0x50 [coresight]
[ 12.894693] cti_add_assoc_to_csdev+0x144/0x1b0 [coresight_cti]
[ 12.900943] coresight_register+0x2c8/0x320 [coresight]
[ 12.906844] etm4_add_coresight_dev.isra.27+0x148/0x280 [coresight_etm4x]
[ 12.912056] etm4_probe+0x144/0x1c0 [coresight_etm4x]
[ 12.918998] etm4_probe_amba+0x40/0x78 [coresight_etm4x]
[ 12.924032] amba_probe+0x11c/0x1f0

After patch: similar log

[ 12.444467] cs_system_cfg: CoreSight Configuration manager initialised
[ 12.456329] coresight-cpu-debug 850000.debug: Coresight debug-CPU0 initialized
[ 12.456754] coresight-cpu-debug 852000.debug: Coresight debug-CPU1 initialized
[ 12.469672] coresight-cpu-debug 854000.debug: Coresight debug-CPU2 initialized
[ 12.476098] coresight-cpu-debug 856000.debug: Coresight debug-CPU3 initialized
[ 12.532409] coresight stm0: STM32 initialized
[ 12.533708] coresight cti_sys0: CTI initialized
[ 12.539478] coresight cti_sys1: CTI initialized
[ 12.550106] coresight cti_cpu0: CTI initialized
[ 12.633931] coresight cti_cpu1: CTI initialized
[ 12.634664] coresight cti_cpu2: CTI initialized
[ 12.638090] coresight cti_cpu3: CTI initialized
[ 12.721136] CSCFG registered etm0

...

[ 12.762643] CSCFG registered etm1
[ 12.762666] coresight etm1: CPU1: etm v4.0 initialized
[ 12.776258] CSCFG registered etm2
[ 12.776282] coresight etm2: CPU2: etm v4.0 initialized
[ 12.784357] CSCFG registered etm3
[ 12.785455] coresight etm3: CPU3: etm v4.0 initialized

Error can also be triggered by manually starting the modules using modprobe
in the following order:

root@linaro-developer:/home/linaro/cs-mods# modprobe coresight
root@linaro-developer:/home/linaro/cs-mods# modprobe coresight-cti
root@linaro-developer:/home/linaro/cs-mods# modprobe coresight-etm4x

Tested on Dragonboard DB410c
Applies to coresight/next

Fixes: 23722fb46725 ("coresight: Fix possible deadlock with lock dependency")
Signed-off-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20221123193818.6253-1-mike.leach@linaro.org
/linux-master/drivers/media/mc/
H A Dmc-request.cdiff e30cc79c Sun Jun 21 05:30:40 MDT 2020 Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi> media: media-request: Fix crash if memory allocation fails

Syzbot reports a NULL-ptr deref in the kref_put() call:

BUG: KASAN: null-ptr-deref in media_request_put drivers/media/mc/mc-request.c:81 [inline]
kref_put include/linux/kref.h:64 [inline]
media_request_put drivers/media/mc/mc-request.c:81 [inline]
media_request_close+0x4d/0x170 drivers/media/mc/mc-request.c:89
__fput+0x2ed/0x750 fs/file_table.c:281
task_work_run+0x147/0x1d0 kernel/task_work.c:123
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_usermode_loop arch/x86/entry/common.c:165 [inline]
prepare_exit_to_usermode+0x48e/0x600 arch/x86/entry/common.c:196

What led to this crash was an injected memory allocation failure in
media_request_alloc():

FAULT_INJECTION: forcing a failure.
name failslab, interval 1, probability 0, space 0, times 0
should_failslab+0x5/0x20
kmem_cache_alloc_trace+0x57/0x300
? anon_inode_getfile+0xe5/0x170
media_request_alloc+0x339/0x440
media_device_request_alloc+0x94/0xc0
media_device_ioctl+0x1fb/0x330
? do_vfs_ioctl+0x6ea/0x1a00
? media_ioctl+0x101/0x120
? __media_device_usb_init+0x430/0x430
? media_poll+0x110/0x110
__se_sys_ioctl+0xf9/0x160
do_syscall_64+0xf3/0x1b0

When that allocation fails, filp->private_data is left uninitialized
which media_request_close() does not expect and crashes.

To avoid this, reorder media_request_alloc() such that
allocating the struct file happens as the last step thus
media_request_close() will no longer get called for a partially created
media request.

Reported-by: syzbot+6bed2d543cf7e48b822b@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi>
Fixes: 10905d70d788 ("media: media-request: implement media requests")
Reviewed-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
diff e30cc79c Sun Jun 21 05:30:40 MDT 2020 Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi> media: media-request: Fix crash if memory allocation fails

Syzbot reports a NULL-ptr deref in the kref_put() call:

BUG: KASAN: null-ptr-deref in media_request_put drivers/media/mc/mc-request.c:81 [inline]
kref_put include/linux/kref.h:64 [inline]
media_request_put drivers/media/mc/mc-request.c:81 [inline]
media_request_close+0x4d/0x170 drivers/media/mc/mc-request.c:89
__fput+0x2ed/0x750 fs/file_table.c:281
task_work_run+0x147/0x1d0 kernel/task_work.c:123
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_usermode_loop arch/x86/entry/common.c:165 [inline]
prepare_exit_to_usermode+0x48e/0x600 arch/x86/entry/common.c:196

What led to this crash was an injected memory allocation failure in
media_request_alloc():

FAULT_INJECTION: forcing a failure.
name failslab, interval 1, probability 0, space 0, times 0
should_failslab+0x5/0x20
kmem_cache_alloc_trace+0x57/0x300
? anon_inode_getfile+0xe5/0x170
media_request_alloc+0x339/0x440
media_device_request_alloc+0x94/0xc0
media_device_ioctl+0x1fb/0x330
? do_vfs_ioctl+0x6ea/0x1a00
? media_ioctl+0x101/0x120
? __media_device_usb_init+0x430/0x430
? media_poll+0x110/0x110
__se_sys_ioctl+0xf9/0x160
do_syscall_64+0xf3/0x1b0

When that allocation fails, filp->private_data is left uninitialized
which media_request_close() does not expect and crashes.

To avoid this, reorder media_request_alloc() such that
allocating the struct file happens as the last step thus
media_request_close() will no longer get called for a partially created
media request.

Reported-by: syzbot+6bed2d543cf7e48b822b@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi>
Fixes: 10905d70d788 ("media: media-request: implement media requests")
Reviewed-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
diff e30cc79c Sun Jun 21 05:30:40 MDT 2020 Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi> media: media-request: Fix crash if memory allocation fails

Syzbot reports a NULL-ptr deref in the kref_put() call:

BUG: KASAN: null-ptr-deref in media_request_put drivers/media/mc/mc-request.c:81 [inline]
kref_put include/linux/kref.h:64 [inline]
media_request_put drivers/media/mc/mc-request.c:81 [inline]
media_request_close+0x4d/0x170 drivers/media/mc/mc-request.c:89
__fput+0x2ed/0x750 fs/file_table.c:281
task_work_run+0x147/0x1d0 kernel/task_work.c:123
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_usermode_loop arch/x86/entry/common.c:165 [inline]
prepare_exit_to_usermode+0x48e/0x600 arch/x86/entry/common.c:196

What led to this crash was an injected memory allocation failure in
media_request_alloc():

FAULT_INJECTION: forcing a failure.
name failslab, interval 1, probability 0, space 0, times 0
should_failslab+0x5/0x20
kmem_cache_alloc_trace+0x57/0x300
? anon_inode_getfile+0xe5/0x170
media_request_alloc+0x339/0x440
media_device_request_alloc+0x94/0xc0
media_device_ioctl+0x1fb/0x330
? do_vfs_ioctl+0x6ea/0x1a00
? media_ioctl+0x101/0x120
? __media_device_usb_init+0x430/0x430
? media_poll+0x110/0x110
__se_sys_ioctl+0xf9/0x160
do_syscall_64+0xf3/0x1b0

When that allocation fails, filp->private_data is left uninitialized
which media_request_close() does not expect and crashes.

To avoid this, reorder media_request_alloc() such that
allocating the struct file happens as the last step thus
media_request_close() will no longer get called for a partially created
media request.

Reported-by: syzbot+6bed2d543cf7e48b822b@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi>
Fixes: 10905d70d788 ("media: media-request: implement media requests")
Reviewed-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
diff e30cc79c Sun Jun 21 05:30:40 MDT 2020 Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi> media: media-request: Fix crash if memory allocation fails

Syzbot reports a NULL-ptr deref in the kref_put() call:

BUG: KASAN: null-ptr-deref in media_request_put drivers/media/mc/mc-request.c:81 [inline]
kref_put include/linux/kref.h:64 [inline]
media_request_put drivers/media/mc/mc-request.c:81 [inline]
media_request_close+0x4d/0x170 drivers/media/mc/mc-request.c:89
__fput+0x2ed/0x750 fs/file_table.c:281
task_work_run+0x147/0x1d0 kernel/task_work.c:123
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_usermode_loop arch/x86/entry/common.c:165 [inline]
prepare_exit_to_usermode+0x48e/0x600 arch/x86/entry/common.c:196

What led to this crash was an injected memory allocation failure in
media_request_alloc():

FAULT_INJECTION: forcing a failure.
name failslab, interval 1, probability 0, space 0, times 0
should_failslab+0x5/0x20
kmem_cache_alloc_trace+0x57/0x300
? anon_inode_getfile+0xe5/0x170
media_request_alloc+0x339/0x440
media_device_request_alloc+0x94/0xc0
media_device_ioctl+0x1fb/0x330
? do_vfs_ioctl+0x6ea/0x1a00
? media_ioctl+0x101/0x120
? __media_device_usb_init+0x430/0x430
? media_poll+0x110/0x110
__se_sys_ioctl+0xf9/0x160
do_syscall_64+0xf3/0x1b0

When that allocation fails, filp->private_data is left uninitialized
which media_request_close() does not expect and crashes.

To avoid this, reorder media_request_alloc() such that
allocating the struct file happens as the last step thus
media_request_close() will no longer get called for a partially created
media request.

Reported-by: syzbot+6bed2d543cf7e48b822b@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi>
Fixes: 10905d70d788 ("media: media-request: implement media requests")
Reviewed-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
diff e30cc79c Sun Jun 21 05:30:40 MDT 2020 Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi> media: media-request: Fix crash if memory allocation fails

Syzbot reports a NULL-ptr deref in the kref_put() call:

BUG: KASAN: null-ptr-deref in media_request_put drivers/media/mc/mc-request.c:81 [inline]
kref_put include/linux/kref.h:64 [inline]
media_request_put drivers/media/mc/mc-request.c:81 [inline]
media_request_close+0x4d/0x170 drivers/media/mc/mc-request.c:89
__fput+0x2ed/0x750 fs/file_table.c:281
task_work_run+0x147/0x1d0 kernel/task_work.c:123
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_usermode_loop arch/x86/entry/common.c:165 [inline]
prepare_exit_to_usermode+0x48e/0x600 arch/x86/entry/common.c:196

What led to this crash was an injected memory allocation failure in
media_request_alloc():

FAULT_INJECTION: forcing a failure.
name failslab, interval 1, probability 0, space 0, times 0
should_failslab+0x5/0x20
kmem_cache_alloc_trace+0x57/0x300
? anon_inode_getfile+0xe5/0x170
media_request_alloc+0x339/0x440
media_device_request_alloc+0x94/0xc0
media_device_ioctl+0x1fb/0x330
? do_vfs_ioctl+0x6ea/0x1a00
? media_ioctl+0x101/0x120
? __media_device_usb_init+0x430/0x430
? media_poll+0x110/0x110
__se_sys_ioctl+0xf9/0x160
do_syscall_64+0xf3/0x1b0

When that allocation fails, filp->private_data is left uninitialized
which media_request_close() does not expect and crashes.

To avoid this, reorder media_request_alloc() such that
allocating the struct file happens as the last step thus
media_request_close() will no longer get called for a partially created
media request.

Reported-by: syzbot+6bed2d543cf7e48b822b@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi>
Fixes: 10905d70d788 ("media: media-request: implement media requests")
Reviewed-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
diff e30cc79c Sun Jun 21 05:30:40 MDT 2020 Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi> media: media-request: Fix crash if memory allocation fails

Syzbot reports a NULL-ptr deref in the kref_put() call:

BUG: KASAN: null-ptr-deref in media_request_put drivers/media/mc/mc-request.c:81 [inline]
kref_put include/linux/kref.h:64 [inline]
media_request_put drivers/media/mc/mc-request.c:81 [inline]
media_request_close+0x4d/0x170 drivers/media/mc/mc-request.c:89
__fput+0x2ed/0x750 fs/file_table.c:281
task_work_run+0x147/0x1d0 kernel/task_work.c:123
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_usermode_loop arch/x86/entry/common.c:165 [inline]
prepare_exit_to_usermode+0x48e/0x600 arch/x86/entry/common.c:196

What led to this crash was an injected memory allocation failure in
media_request_alloc():

FAULT_INJECTION: forcing a failure.
name failslab, interval 1, probability 0, space 0, times 0
should_failslab+0x5/0x20
kmem_cache_alloc_trace+0x57/0x300
? anon_inode_getfile+0xe5/0x170
media_request_alloc+0x339/0x440
media_device_request_alloc+0x94/0xc0
media_device_ioctl+0x1fb/0x330
? do_vfs_ioctl+0x6ea/0x1a00
? media_ioctl+0x101/0x120
? __media_device_usb_init+0x430/0x430
? media_poll+0x110/0x110
__se_sys_ioctl+0xf9/0x160
do_syscall_64+0xf3/0x1b0

When that allocation fails, filp->private_data is left uninitialized
which media_request_close() does not expect and crashes.

To avoid this, reorder media_request_alloc() such that
allocating the struct file happens as the last step thus
media_request_close() will no longer get called for a partially created
media request.

Reported-by: syzbot+6bed2d543cf7e48b822b@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi>
Fixes: 10905d70d788 ("media: media-request: implement media requests")
Reviewed-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
diff e30cc79c Sun Jun 21 05:30:40 MDT 2020 Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi> media: media-request: Fix crash if memory allocation fails

Syzbot reports a NULL-ptr deref in the kref_put() call:

BUG: KASAN: null-ptr-deref in media_request_put drivers/media/mc/mc-request.c:81 [inline]
kref_put include/linux/kref.h:64 [inline]
media_request_put drivers/media/mc/mc-request.c:81 [inline]
media_request_close+0x4d/0x170 drivers/media/mc/mc-request.c:89
__fput+0x2ed/0x750 fs/file_table.c:281
task_work_run+0x147/0x1d0 kernel/task_work.c:123
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_usermode_loop arch/x86/entry/common.c:165 [inline]
prepare_exit_to_usermode+0x48e/0x600 arch/x86/entry/common.c:196

What led to this crash was an injected memory allocation failure in
media_request_alloc():

FAULT_INJECTION: forcing a failure.
name failslab, interval 1, probability 0, space 0, times 0
should_failslab+0x5/0x20
kmem_cache_alloc_trace+0x57/0x300
? anon_inode_getfile+0xe5/0x170
media_request_alloc+0x339/0x440
media_device_request_alloc+0x94/0xc0
media_device_ioctl+0x1fb/0x330
? do_vfs_ioctl+0x6ea/0x1a00
? media_ioctl+0x101/0x120
? __media_device_usb_init+0x430/0x430
? media_poll+0x110/0x110
__se_sys_ioctl+0xf9/0x160
do_syscall_64+0xf3/0x1b0

When that allocation fails, filp->private_data is left uninitialized
which media_request_close() does not expect and crashes.

To avoid this, reorder media_request_alloc() such that
allocating the struct file happens as the last step thus
media_request_close() will no longer get called for a partially created
media request.

Reported-by: syzbot+6bed2d543cf7e48b822b@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi>
Fixes: 10905d70d788 ("media: media-request: implement media requests")
Reviewed-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
diff e30cc79c Sun Jun 21 05:30:40 MDT 2020 Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi> media: media-request: Fix crash if memory allocation fails

Syzbot reports a NULL-ptr deref in the kref_put() call:

BUG: KASAN: null-ptr-deref in media_request_put drivers/media/mc/mc-request.c:81 [inline]
kref_put include/linux/kref.h:64 [inline]
media_request_put drivers/media/mc/mc-request.c:81 [inline]
media_request_close+0x4d/0x170 drivers/media/mc/mc-request.c:89
__fput+0x2ed/0x750 fs/file_table.c:281
task_work_run+0x147/0x1d0 kernel/task_work.c:123
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_usermode_loop arch/x86/entry/common.c:165 [inline]
prepare_exit_to_usermode+0x48e/0x600 arch/x86/entry/common.c:196

What led to this crash was an injected memory allocation failure in
media_request_alloc():

FAULT_INJECTION: forcing a failure.
name failslab, interval 1, probability 0, space 0, times 0
should_failslab+0x5/0x20
kmem_cache_alloc_trace+0x57/0x300
? anon_inode_getfile+0xe5/0x170
media_request_alloc+0x339/0x440
media_device_request_alloc+0x94/0xc0
media_device_ioctl+0x1fb/0x330
? do_vfs_ioctl+0x6ea/0x1a00
? media_ioctl+0x101/0x120
? __media_device_usb_init+0x430/0x430
? media_poll+0x110/0x110
__se_sys_ioctl+0xf9/0x160
do_syscall_64+0xf3/0x1b0

When that allocation fails, filp->private_data is left uninitialized
which media_request_close() does not expect and crashes.

To avoid this, reorder media_request_alloc() such that
allocating the struct file happens as the last step thus
media_request_close() will no longer get called for a partially created
media request.

Reported-by: syzbot+6bed2d543cf7e48b822b@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi>
Fixes: 10905d70d788 ("media: media-request: implement media requests")
Reviewed-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
diff e30cc79c Sun Jun 21 05:30:40 MDT 2020 Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi> media: media-request: Fix crash if memory allocation fails

Syzbot reports a NULL-ptr deref in the kref_put() call:

BUG: KASAN: null-ptr-deref in media_request_put drivers/media/mc/mc-request.c:81 [inline]
kref_put include/linux/kref.h:64 [inline]
media_request_put drivers/media/mc/mc-request.c:81 [inline]
media_request_close+0x4d/0x170 drivers/media/mc/mc-request.c:89
__fput+0x2ed/0x750 fs/file_table.c:281
task_work_run+0x147/0x1d0 kernel/task_work.c:123
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_usermode_loop arch/x86/entry/common.c:165 [inline]
prepare_exit_to_usermode+0x48e/0x600 arch/x86/entry/common.c:196

What led to this crash was an injected memory allocation failure in
media_request_alloc():

FAULT_INJECTION: forcing a failure.
name failslab, interval 1, probability 0, space 0, times 0
should_failslab+0x5/0x20
kmem_cache_alloc_trace+0x57/0x300
? anon_inode_getfile+0xe5/0x170
media_request_alloc+0x339/0x440
media_device_request_alloc+0x94/0xc0
media_device_ioctl+0x1fb/0x330
? do_vfs_ioctl+0x6ea/0x1a00
? media_ioctl+0x101/0x120
? __media_device_usb_init+0x430/0x430
? media_poll+0x110/0x110
__se_sys_ioctl+0xf9/0x160
do_syscall_64+0xf3/0x1b0

When that allocation fails, filp->private_data is left uninitialized
which media_request_close() does not expect and crashes.

To avoid this, reorder media_request_alloc() such that
allocating the struct file happens as the last step thus
media_request_close() will no longer get called for a partially created
media request.

Reported-by: syzbot+6bed2d543cf7e48b822b@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi>
Fixes: 10905d70d788 ("media: media-request: implement media requests")
Reviewed-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
diff e30cc79c Sun Jun 21 05:30:40 MDT 2020 Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi> media: media-request: Fix crash if memory allocation fails

Syzbot reports a NULL-ptr deref in the kref_put() call:

BUG: KASAN: null-ptr-deref in media_request_put drivers/media/mc/mc-request.c:81 [inline]
kref_put include/linux/kref.h:64 [inline]
media_request_put drivers/media/mc/mc-request.c:81 [inline]
media_request_close+0x4d/0x170 drivers/media/mc/mc-request.c:89
__fput+0x2ed/0x750 fs/file_table.c:281
task_work_run+0x147/0x1d0 kernel/task_work.c:123
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_usermode_loop arch/x86/entry/common.c:165 [inline]
prepare_exit_to_usermode+0x48e/0x600 arch/x86/entry/common.c:196

What led to this crash was an injected memory allocation failure in
media_request_alloc():

FAULT_INJECTION: forcing a failure.
name failslab, interval 1, probability 0, space 0, times 0
should_failslab+0x5/0x20
kmem_cache_alloc_trace+0x57/0x300
? anon_inode_getfile+0xe5/0x170
media_request_alloc+0x339/0x440
media_device_request_alloc+0x94/0xc0
media_device_ioctl+0x1fb/0x330
? do_vfs_ioctl+0x6ea/0x1a00
? media_ioctl+0x101/0x120
? __media_device_usb_init+0x430/0x430
? media_poll+0x110/0x110
__se_sys_ioctl+0xf9/0x160
do_syscall_64+0xf3/0x1b0

When that allocation fails, filp->private_data is left uninitialized
which media_request_close() does not expect and crashes.

To avoid this, reorder media_request_alloc() such that
allocating the struct file happens as the last step thus
media_request_close() will no longer get called for a partially created
media request.

Reported-by: syzbot+6bed2d543cf7e48b822b@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi>
Fixes: 10905d70d788 ("media: media-request: implement media requests")
Reviewed-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
diff e30cc79c Sun Jun 21 05:30:40 MDT 2020 Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi> media: media-request: Fix crash if memory allocation fails

Syzbot reports a NULL-ptr deref in the kref_put() call:

BUG: KASAN: null-ptr-deref in media_request_put drivers/media/mc/mc-request.c:81 [inline]
kref_put include/linux/kref.h:64 [inline]
media_request_put drivers/media/mc/mc-request.c:81 [inline]
media_request_close+0x4d/0x170 drivers/media/mc/mc-request.c:89
__fput+0x2ed/0x750 fs/file_table.c:281
task_work_run+0x147/0x1d0 kernel/task_work.c:123
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_usermode_loop arch/x86/entry/common.c:165 [inline]
prepare_exit_to_usermode+0x48e/0x600 arch/x86/entry/common.c:196

What led to this crash was an injected memory allocation failure in
media_request_alloc():

FAULT_INJECTION: forcing a failure.
name failslab, interval 1, probability 0, space 0, times 0
should_failslab+0x5/0x20
kmem_cache_alloc_trace+0x57/0x300
? anon_inode_getfile+0xe5/0x170
media_request_alloc+0x339/0x440
media_device_request_alloc+0x94/0xc0
media_device_ioctl+0x1fb/0x330
? do_vfs_ioctl+0x6ea/0x1a00
? media_ioctl+0x101/0x120
? __media_device_usb_init+0x430/0x430
? media_poll+0x110/0x110
__se_sys_ioctl+0xf9/0x160
do_syscall_64+0xf3/0x1b0

When that allocation fails, filp->private_data is left uninitialized
which media_request_close() does not expect and crashes.

To avoid this, reorder media_request_alloc() such that
allocating the struct file happens as the last step thus
media_request_close() will no longer get called for a partially created
media request.

Reported-by: syzbot+6bed2d543cf7e48b822b@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi>
Fixes: 10905d70d788 ("media: media-request: implement media requests")
Reviewed-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
diff e30cc79c Sun Jun 21 05:30:40 MDT 2020 Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi> media: media-request: Fix crash if memory allocation fails

Syzbot reports a NULL-ptr deref in the kref_put() call:

BUG: KASAN: null-ptr-deref in media_request_put drivers/media/mc/mc-request.c:81 [inline]
kref_put include/linux/kref.h:64 [inline]
media_request_put drivers/media/mc/mc-request.c:81 [inline]
media_request_close+0x4d/0x170 drivers/media/mc/mc-request.c:89
__fput+0x2ed/0x750 fs/file_table.c:281
task_work_run+0x147/0x1d0 kernel/task_work.c:123
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_usermode_loop arch/x86/entry/common.c:165 [inline]
prepare_exit_to_usermode+0x48e/0x600 arch/x86/entry/common.c:196

What led to this crash was an injected memory allocation failure in
media_request_alloc():

FAULT_INJECTION: forcing a failure.
name failslab, interval 1, probability 0, space 0, times 0
should_failslab+0x5/0x20
kmem_cache_alloc_trace+0x57/0x300
? anon_inode_getfile+0xe5/0x170
media_request_alloc+0x339/0x440
media_device_request_alloc+0x94/0xc0
media_device_ioctl+0x1fb/0x330
? do_vfs_ioctl+0x6ea/0x1a00
? media_ioctl+0x101/0x120
? __media_device_usb_init+0x430/0x430
? media_poll+0x110/0x110
__se_sys_ioctl+0xf9/0x160
do_syscall_64+0xf3/0x1b0

When that allocation fails, filp->private_data is left uninitialized
which media_request_close() does not expect and crashes.

To avoid this, reorder media_request_alloc() such that
allocating the struct file happens as the last step thus
media_request_close() will no longer get called for a partially created
media request.

Reported-by: syzbot+6bed2d543cf7e48b822b@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi>
Fixes: 10905d70d788 ("media: media-request: implement media requests")
Reviewed-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
diff e30cc79c Sun Jun 21 05:30:40 MDT 2020 Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi> media: media-request: Fix crash if memory allocation fails

Syzbot reports a NULL-ptr deref in the kref_put() call:

BUG: KASAN: null-ptr-deref in media_request_put drivers/media/mc/mc-request.c:81 [inline]
kref_put include/linux/kref.h:64 [inline]
media_request_put drivers/media/mc/mc-request.c:81 [inline]
media_request_close+0x4d/0x170 drivers/media/mc/mc-request.c:89
__fput+0x2ed/0x750 fs/file_table.c:281
task_work_run+0x147/0x1d0 kernel/task_work.c:123
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_usermode_loop arch/x86/entry/common.c:165 [inline]
prepare_exit_to_usermode+0x48e/0x600 arch/x86/entry/common.c:196

What led to this crash was an injected memory allocation failure in
media_request_alloc():

FAULT_INJECTION: forcing a failure.
name failslab, interval 1, probability 0, space 0, times 0
should_failslab+0x5/0x20
kmem_cache_alloc_trace+0x57/0x300
? anon_inode_getfile+0xe5/0x170
media_request_alloc+0x339/0x440
media_device_request_alloc+0x94/0xc0
media_device_ioctl+0x1fb/0x330
? do_vfs_ioctl+0x6ea/0x1a00
? media_ioctl+0x101/0x120
? __media_device_usb_init+0x430/0x430
? media_poll+0x110/0x110
__se_sys_ioctl+0xf9/0x160
do_syscall_64+0xf3/0x1b0

When that allocation fails, filp->private_data is left uninitialized
which media_request_close() does not expect and crashes.

To avoid this, reorder media_request_alloc() such that
allocating the struct file happens as the last step thus
media_request_close() will no longer get called for a partially created
media request.

Reported-by: syzbot+6bed2d543cf7e48b822b@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi>
Fixes: 10905d70d788 ("media: media-request: implement media requests")
Reviewed-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
diff e30cc79c Sun Jun 21 05:30:40 MDT 2020 Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi> media: media-request: Fix crash if memory allocation fails

Syzbot reports a NULL-ptr deref in the kref_put() call:

BUG: KASAN: null-ptr-deref in media_request_put drivers/media/mc/mc-request.c:81 [inline]
kref_put include/linux/kref.h:64 [inline]
media_request_put drivers/media/mc/mc-request.c:81 [inline]
media_request_close+0x4d/0x170 drivers/media/mc/mc-request.c:89
__fput+0x2ed/0x750 fs/file_table.c:281
task_work_run+0x147/0x1d0 kernel/task_work.c:123
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_usermode_loop arch/x86/entry/common.c:165 [inline]
prepare_exit_to_usermode+0x48e/0x600 arch/x86/entry/common.c:196

What led to this crash was an injected memory allocation failure in
media_request_alloc():

FAULT_INJECTION: forcing a failure.
name failslab, interval 1, probability 0, space 0, times 0
should_failslab+0x5/0x20
kmem_cache_alloc_trace+0x57/0x300
? anon_inode_getfile+0xe5/0x170
media_request_alloc+0x339/0x440
media_device_request_alloc+0x94/0xc0
media_device_ioctl+0x1fb/0x330
? do_vfs_ioctl+0x6ea/0x1a00
? media_ioctl+0x101/0x120
? __media_device_usb_init+0x430/0x430
? media_poll+0x110/0x110
__se_sys_ioctl+0xf9/0x160
do_syscall_64+0xf3/0x1b0

When that allocation fails, filp->private_data is left uninitialized
which media_request_close() does not expect and crashes.

To avoid this, reorder media_request_alloc() such that
allocating the struct file happens as the last step thus
media_request_close() will no longer get called for a partially created
media request.

Reported-by: syzbot+6bed2d543cf7e48b822b@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi>
Fixes: 10905d70d788 ("media: media-request: implement media requests")
Reviewed-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
diff e30cc79c Sun Jun 21 05:30:40 MDT 2020 Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi> media: media-request: Fix crash if memory allocation fails

Syzbot reports a NULL-ptr deref in the kref_put() call:

BUG: KASAN: null-ptr-deref in media_request_put drivers/media/mc/mc-request.c:81 [inline]
kref_put include/linux/kref.h:64 [inline]
media_request_put drivers/media/mc/mc-request.c:81 [inline]
media_request_close+0x4d/0x170 drivers/media/mc/mc-request.c:89
__fput+0x2ed/0x750 fs/file_table.c:281
task_work_run+0x147/0x1d0 kernel/task_work.c:123
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_usermode_loop arch/x86/entry/common.c:165 [inline]
prepare_exit_to_usermode+0x48e/0x600 arch/x86/entry/common.c:196

What led to this crash was an injected memory allocation failure in
media_request_alloc():

FAULT_INJECTION: forcing a failure.
name failslab, interval 1, probability 0, space 0, times 0
should_failslab+0x5/0x20
kmem_cache_alloc_trace+0x57/0x300
? anon_inode_getfile+0xe5/0x170
media_request_alloc+0x339/0x440
media_device_request_alloc+0x94/0xc0
media_device_ioctl+0x1fb/0x330
? do_vfs_ioctl+0x6ea/0x1a00
? media_ioctl+0x101/0x120
? __media_device_usb_init+0x430/0x430
? media_poll+0x110/0x110
__se_sys_ioctl+0xf9/0x160
do_syscall_64+0xf3/0x1b0

When that allocation fails, filp->private_data is left uninitialized
which media_request_close() does not expect and crashes.

To avoid this, reorder media_request_alloc() such that
allocating the struct file happens as the last step thus
media_request_close() will no longer get called for a partially created
media request.

Reported-by: syzbot+6bed2d543cf7e48b822b@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi>
Fixes: 10905d70d788 ("media: media-request: implement media requests")
Reviewed-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
diff e30cc79c Sun Jun 21 05:30:40 MDT 2020 Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi> media: media-request: Fix crash if memory allocation fails

Syzbot reports a NULL-ptr deref in the kref_put() call:

BUG: KASAN: null-ptr-deref in media_request_put drivers/media/mc/mc-request.c:81 [inline]
kref_put include/linux/kref.h:64 [inline]
media_request_put drivers/media/mc/mc-request.c:81 [inline]
media_request_close+0x4d/0x170 drivers/media/mc/mc-request.c:89
__fput+0x2ed/0x750 fs/file_table.c:281
task_work_run+0x147/0x1d0 kernel/task_work.c:123
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_usermode_loop arch/x86/entry/common.c:165 [inline]
prepare_exit_to_usermode+0x48e/0x600 arch/x86/entry/common.c:196

What led to this crash was an injected memory allocation failure in
media_request_alloc():

FAULT_INJECTION: forcing a failure.
name failslab, interval 1, probability 0, space 0, times 0
should_failslab+0x5/0x20
kmem_cache_alloc_trace+0x57/0x300
? anon_inode_getfile+0xe5/0x170
media_request_alloc+0x339/0x440
media_device_request_alloc+0x94/0xc0
media_device_ioctl+0x1fb/0x330
? do_vfs_ioctl+0x6ea/0x1a00
? media_ioctl+0x101/0x120
? __media_device_usb_init+0x430/0x430
? media_poll+0x110/0x110
__se_sys_ioctl+0xf9/0x160
do_syscall_64+0xf3/0x1b0

When that allocation fails, filp->private_data is left uninitialized
which media_request_close() does not expect and crashes.

To avoid this, reorder media_request_alloc() such that
allocating the struct file happens as the last step thus
media_request_close() will no longer get called for a partially created
media request.

Reported-by: syzbot+6bed2d543cf7e48b822b@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi>
Fixes: 10905d70d788 ("media: media-request: implement media requests")
Reviewed-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
diff e30cc79c Sun Jun 21 05:30:40 MDT 2020 Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi> media: media-request: Fix crash if memory allocation fails

Syzbot reports a NULL-ptr deref in the kref_put() call:

BUG: KASAN: null-ptr-deref in media_request_put drivers/media/mc/mc-request.c:81 [inline]
kref_put include/linux/kref.h:64 [inline]
media_request_put drivers/media/mc/mc-request.c:81 [inline]
media_request_close+0x4d/0x170 drivers/media/mc/mc-request.c:89
__fput+0x2ed/0x750 fs/file_table.c:281
task_work_run+0x147/0x1d0 kernel/task_work.c:123
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_usermode_loop arch/x86/entry/common.c:165 [inline]
prepare_exit_to_usermode+0x48e/0x600 arch/x86/entry/common.c:196

What led to this crash was an injected memory allocation failure in
media_request_alloc():

FAULT_INJECTION: forcing a failure.
name failslab, interval 1, probability 0, space 0, times 0
should_failslab+0x5/0x20
kmem_cache_alloc_trace+0x57/0x300
? anon_inode_getfile+0xe5/0x170
media_request_alloc+0x339/0x440
media_device_request_alloc+0x94/0xc0
media_device_ioctl+0x1fb/0x330
? do_vfs_ioctl+0x6ea/0x1a00
? media_ioctl+0x101/0x120
? __media_device_usb_init+0x430/0x430
? media_poll+0x110/0x110
__se_sys_ioctl+0xf9/0x160
do_syscall_64+0xf3/0x1b0

When that allocation fails, filp->private_data is left uninitialized
which media_request_close() does not expect and crashes.

To avoid this, reorder media_request_alloc() such that
allocating the struct file happens as the last step thus
media_request_close() will no longer get called for a partially created
media request.

Reported-by: syzbot+6bed2d543cf7e48b822b@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi>
Fixes: 10905d70d788 ("media: media-request: implement media requests")
Reviewed-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
diff e30cc79c Sun Jun 21 05:30:40 MDT 2020 Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi> media: media-request: Fix crash if memory allocation fails

Syzbot reports a NULL-ptr deref in the kref_put() call:

BUG: KASAN: null-ptr-deref in media_request_put drivers/media/mc/mc-request.c:81 [inline]
kref_put include/linux/kref.h:64 [inline]
media_request_put drivers/media/mc/mc-request.c:81 [inline]
media_request_close+0x4d/0x170 drivers/media/mc/mc-request.c:89
__fput+0x2ed/0x750 fs/file_table.c:281
task_work_run+0x147/0x1d0 kernel/task_work.c:123
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_usermode_loop arch/x86/entry/common.c:165 [inline]
prepare_exit_to_usermode+0x48e/0x600 arch/x86/entry/common.c:196

What led to this crash was an injected memory allocation failure in
media_request_alloc():

FAULT_INJECTION: forcing a failure.
name failslab, interval 1, probability 0, space 0, times 0
should_failslab+0x5/0x20
kmem_cache_alloc_trace+0x57/0x300
? anon_inode_getfile+0xe5/0x170
media_request_alloc+0x339/0x440
media_device_request_alloc+0x94/0xc0
media_device_ioctl+0x1fb/0x330
? do_vfs_ioctl+0x6ea/0x1a00
? media_ioctl+0x101/0x120
? __media_device_usb_init+0x430/0x430
? media_poll+0x110/0x110
__se_sys_ioctl+0xf9/0x160
do_syscall_64+0xf3/0x1b0

When that allocation fails, filp->private_data is left uninitialized
which media_request_close() does not expect and crashes.

To avoid this, reorder media_request_alloc() such that
allocating the struct file happens as the last step thus
media_request_close() will no longer get called for a partially created
media request.

Reported-by: syzbot+6bed2d543cf7e48b822b@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi>
Fixes: 10905d70d788 ("media: media-request: implement media requests")
Reviewed-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
diff e30cc79c Sun Jun 21 05:30:40 MDT 2020 Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi> media: media-request: Fix crash if memory allocation fails

Syzbot reports a NULL-ptr deref in the kref_put() call:

BUG: KASAN: null-ptr-deref in media_request_put drivers/media/mc/mc-request.c:81 [inline]
kref_put include/linux/kref.h:64 [inline]
media_request_put drivers/media/mc/mc-request.c:81 [inline]
media_request_close+0x4d/0x170 drivers/media/mc/mc-request.c:89
__fput+0x2ed/0x750 fs/file_table.c:281
task_work_run+0x147/0x1d0 kernel/task_work.c:123
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_usermode_loop arch/x86/entry/common.c:165 [inline]
prepare_exit_to_usermode+0x48e/0x600 arch/x86/entry/common.c:196

What led to this crash was an injected memory allocation failure in
media_request_alloc():

FAULT_INJECTION: forcing a failure.
name failslab, interval 1, probability 0, space 0, times 0
should_failslab+0x5/0x20
kmem_cache_alloc_trace+0x57/0x300
? anon_inode_getfile+0xe5/0x170
media_request_alloc+0x339/0x440
media_device_request_alloc+0x94/0xc0
media_device_ioctl+0x1fb/0x330
? do_vfs_ioctl+0x6ea/0x1a00
? media_ioctl+0x101/0x120
? __media_device_usb_init+0x430/0x430
? media_poll+0x110/0x110
__se_sys_ioctl+0xf9/0x160
do_syscall_64+0xf3/0x1b0

When that allocation fails, filp->private_data is left uninitialized
which media_request_close() does not expect and crashes.

To avoid this, reorder media_request_alloc() such that
allocating the struct file happens as the last step thus
media_request_close() will no longer get called for a partially created
media request.

Reported-by: syzbot+6bed2d543cf7e48b822b@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi>
Fixes: 10905d70d788 ("media: media-request: implement media requests")
Reviewed-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
diff e30cc79c Sun Jun 21 05:30:40 MDT 2020 Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi> media: media-request: Fix crash if memory allocation fails

Syzbot reports a NULL-ptr deref in the kref_put() call:

BUG: KASAN: null-ptr-deref in media_request_put drivers/media/mc/mc-request.c:81 [inline]
kref_put include/linux/kref.h:64 [inline]
media_request_put drivers/media/mc/mc-request.c:81 [inline]
media_request_close+0x4d/0x170 drivers/media/mc/mc-request.c:89
__fput+0x2ed/0x750 fs/file_table.c:281
task_work_run+0x147/0x1d0 kernel/task_work.c:123
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_usermode_loop arch/x86/entry/common.c:165 [inline]
prepare_exit_to_usermode+0x48e/0x600 arch/x86/entry/common.c:196

What led to this crash was an injected memory allocation failure in
media_request_alloc():

FAULT_INJECTION: forcing a failure.
name failslab, interval 1, probability 0, space 0, times 0
should_failslab+0x5/0x20
kmem_cache_alloc_trace+0x57/0x300
? anon_inode_getfile+0xe5/0x170
media_request_alloc+0x339/0x440
media_device_request_alloc+0x94/0xc0
media_device_ioctl+0x1fb/0x330
? do_vfs_ioctl+0x6ea/0x1a00
? media_ioctl+0x101/0x120
? __media_device_usb_init+0x430/0x430
? media_poll+0x110/0x110
__se_sys_ioctl+0xf9/0x160
do_syscall_64+0xf3/0x1b0

When that allocation fails, filp->private_data is left uninitialized
which media_request_close() does not expect and crashes.

To avoid this, reorder media_request_alloc() such that
allocating the struct file happens as the last step thus
media_request_close() will no longer get called for a partially created
media request.

Reported-by: syzbot+6bed2d543cf7e48b822b@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi>
Fixes: 10905d70d788 ("media: media-request: implement media requests")
Reviewed-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
diff e30cc79c Sun Jun 21 05:30:40 MDT 2020 Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi> media: media-request: Fix crash if memory allocation fails

Syzbot reports a NULL-ptr deref in the kref_put() call:

BUG: KASAN: null-ptr-deref in media_request_put drivers/media/mc/mc-request.c:81 [inline]
kref_put include/linux/kref.h:64 [inline]
media_request_put drivers/media/mc/mc-request.c:81 [inline]
media_request_close+0x4d/0x170 drivers/media/mc/mc-request.c:89
__fput+0x2ed/0x750 fs/file_table.c:281
task_work_run+0x147/0x1d0 kernel/task_work.c:123
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_usermode_loop arch/x86/entry/common.c:165 [inline]
prepare_exit_to_usermode+0x48e/0x600 arch/x86/entry/common.c:196

What led to this crash was an injected memory allocation failure in
media_request_alloc():

FAULT_INJECTION: forcing a failure.
name failslab, interval 1, probability 0, space 0, times 0
should_failslab+0x5/0x20
kmem_cache_alloc_trace+0x57/0x300
? anon_inode_getfile+0xe5/0x170
media_request_alloc+0x339/0x440
media_device_request_alloc+0x94/0xc0
media_device_ioctl+0x1fb/0x330
? do_vfs_ioctl+0x6ea/0x1a00
? media_ioctl+0x101/0x120
? __media_device_usb_init+0x430/0x430
? media_poll+0x110/0x110
__se_sys_ioctl+0xf9/0x160
do_syscall_64+0xf3/0x1b0

When that allocation fails, filp->private_data is left uninitialized
which media_request_close() does not expect and crashes.

To avoid this, reorder media_request_alloc() such that
allocating the struct file happens as the last step thus
media_request_close() will no longer get called for a partially created
media request.

Reported-by: syzbot+6bed2d543cf7e48b822b@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi>
Fixes: 10905d70d788 ("media: media-request: implement media requests")
Reviewed-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
diff e30cc79c Sun Jun 21 05:30:40 MDT 2020 Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi> media: media-request: Fix crash if memory allocation fails

Syzbot reports a NULL-ptr deref in the kref_put() call:

BUG: KASAN: null-ptr-deref in media_request_put drivers/media/mc/mc-request.c:81 [inline]
kref_put include/linux/kref.h:64 [inline]
media_request_put drivers/media/mc/mc-request.c:81 [inline]
media_request_close+0x4d/0x170 drivers/media/mc/mc-request.c:89
__fput+0x2ed/0x750 fs/file_table.c:281
task_work_run+0x147/0x1d0 kernel/task_work.c:123
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_usermode_loop arch/x86/entry/common.c:165 [inline]
prepare_exit_to_usermode+0x48e/0x600 arch/x86/entry/common.c:196

What led to this crash was an injected memory allocation failure in
media_request_alloc():

FAULT_INJECTION: forcing a failure.
name failslab, interval 1, probability 0, space 0, times 0
should_failslab+0x5/0x20
kmem_cache_alloc_trace+0x57/0x300
? anon_inode_getfile+0xe5/0x170
media_request_alloc+0x339/0x440
media_device_request_alloc+0x94/0xc0
media_device_ioctl+0x1fb/0x330
? do_vfs_ioctl+0x6ea/0x1a00
? media_ioctl+0x101/0x120
? __media_device_usb_init+0x430/0x430
? media_poll+0x110/0x110
__se_sys_ioctl+0xf9/0x160
do_syscall_64+0xf3/0x1b0

When that allocation fails, filp->private_data is left uninitialized
which media_request_close() does not expect and crashes.

To avoid this, reorder media_request_alloc() such that
allocating the struct file happens as the last step thus
media_request_close() will no longer get called for a partially created
media request.

Reported-by: syzbot+6bed2d543cf7e48b822b@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi>
Fixes: 10905d70d788 ("media: media-request: implement media requests")
Reviewed-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
diff e30cc79c Sun Jun 21 05:30:40 MDT 2020 Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi> media: media-request: Fix crash if memory allocation fails

Syzbot reports a NULL-ptr deref in the kref_put() call:

BUG: KASAN: null-ptr-deref in media_request_put drivers/media/mc/mc-request.c:81 [inline]
kref_put include/linux/kref.h:64 [inline]
media_request_put drivers/media/mc/mc-request.c:81 [inline]
media_request_close+0x4d/0x170 drivers/media/mc/mc-request.c:89
__fput+0x2ed/0x750 fs/file_table.c:281
task_work_run+0x147/0x1d0 kernel/task_work.c:123
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_usermode_loop arch/x86/entry/common.c:165 [inline]
prepare_exit_to_usermode+0x48e/0x600 arch/x86/entry/common.c:196

What led to this crash was an injected memory allocation failure in
media_request_alloc():

FAULT_INJECTION: forcing a failure.
name failslab, interval 1, probability 0, space 0, times 0
should_failslab+0x5/0x20
kmem_cache_alloc_trace+0x57/0x300
? anon_inode_getfile+0xe5/0x170
media_request_alloc+0x339/0x440
media_device_request_alloc+0x94/0xc0
media_device_ioctl+0x1fb/0x330
? do_vfs_ioctl+0x6ea/0x1a00
? media_ioctl+0x101/0x120
? __media_device_usb_init+0x430/0x430
? media_poll+0x110/0x110
__se_sys_ioctl+0xf9/0x160
do_syscall_64+0xf3/0x1b0

When that allocation fails, filp->private_data is left uninitialized
which media_request_close() does not expect and crashes.

To avoid this, reorder media_request_alloc() such that
allocating the struct file happens as the last step thus
media_request_close() will no longer get called for a partially created
media request.

Reported-by: syzbot+6bed2d543cf7e48b822b@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi>
Fixes: 10905d70d788 ("media: media-request: implement media requests")
Reviewed-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
diff e30cc79c Sun Jun 21 05:30:40 MDT 2020 Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi> media: media-request: Fix crash if memory allocation fails

Syzbot reports a NULL-ptr deref in the kref_put() call:

BUG: KASAN: null-ptr-deref in media_request_put drivers/media/mc/mc-request.c:81 [inline]
kref_put include/linux/kref.h:64 [inline]
media_request_put drivers/media/mc/mc-request.c:81 [inline]
media_request_close+0x4d/0x170 drivers/media/mc/mc-request.c:89
__fput+0x2ed/0x750 fs/file_table.c:281
task_work_run+0x147/0x1d0 kernel/task_work.c:123
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_usermode_loop arch/x86/entry/common.c:165 [inline]
prepare_exit_to_usermode+0x48e/0x600 arch/x86/entry/common.c:196

What led to this crash was an injected memory allocation failure in
media_request_alloc():

FAULT_INJECTION: forcing a failure.
name failslab, interval 1, probability 0, space 0, times 0
should_failslab+0x5/0x20
kmem_cache_alloc_trace+0x57/0x300
? anon_inode_getfile+0xe5/0x170
media_request_alloc+0x339/0x440
media_device_request_alloc+0x94/0xc0
media_device_ioctl+0x1fb/0x330
? do_vfs_ioctl+0x6ea/0x1a00
? media_ioctl+0x101/0x120
? __media_device_usb_init+0x430/0x430
? media_poll+0x110/0x110
__se_sys_ioctl+0xf9/0x160
do_syscall_64+0xf3/0x1b0

When that allocation fails, filp->private_data is left uninitialized
which media_request_close() does not expect and crashes.

To avoid this, reorder media_request_alloc() such that
allocating the struct file happens as the last step thus
media_request_close() will no longer get called for a partially created
media request.

Reported-by: syzbot+6bed2d543cf7e48b822b@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi>
Fixes: 10905d70d788 ("media: media-request: implement media requests")
Reviewed-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
diff e30cc79c Sun Jun 21 05:30:40 MDT 2020 Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi> media: media-request: Fix crash if memory allocation fails

Syzbot reports a NULL-ptr deref in the kref_put() call:

BUG: KASAN: null-ptr-deref in media_request_put drivers/media/mc/mc-request.c:81 [inline]
kref_put include/linux/kref.h:64 [inline]
media_request_put drivers/media/mc/mc-request.c:81 [inline]
media_request_close+0x4d/0x170 drivers/media/mc/mc-request.c:89
__fput+0x2ed/0x750 fs/file_table.c:281
task_work_run+0x147/0x1d0 kernel/task_work.c:123
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_usermode_loop arch/x86/entry/common.c:165 [inline]
prepare_exit_to_usermode+0x48e/0x600 arch/x86/entry/common.c:196

What led to this crash was an injected memory allocation failure in
media_request_alloc():

FAULT_INJECTION: forcing a failure.
name failslab, interval 1, probability 0, space 0, times 0
should_failslab+0x5/0x20
kmem_cache_alloc_trace+0x57/0x300
? anon_inode_getfile+0xe5/0x170
media_request_alloc+0x339/0x440
media_device_request_alloc+0x94/0xc0
media_device_ioctl+0x1fb/0x330
? do_vfs_ioctl+0x6ea/0x1a00
? media_ioctl+0x101/0x120
? __media_device_usb_init+0x430/0x430
? media_poll+0x110/0x110
__se_sys_ioctl+0xf9/0x160
do_syscall_64+0xf3/0x1b0

When that allocation fails, filp->private_data is left uninitialized
which media_request_close() does not expect and crashes.

To avoid this, reorder media_request_alloc() such that
allocating the struct file happens as the last step thus
media_request_close() will no longer get called for a partially created
media request.

Reported-by: syzbot+6bed2d543cf7e48b822b@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi>
Fixes: 10905d70d788 ("media: media-request: implement media requests")
Reviewed-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
diff e30cc79c Sun Jun 21 05:30:40 MDT 2020 Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi> media: media-request: Fix crash if memory allocation fails

Syzbot reports a NULL-ptr deref in the kref_put() call:

BUG: KASAN: null-ptr-deref in media_request_put drivers/media/mc/mc-request.c:81 [inline]
kref_put include/linux/kref.h:64 [inline]
media_request_put drivers/media/mc/mc-request.c:81 [inline]
media_request_close+0x4d/0x170 drivers/media/mc/mc-request.c:89
__fput+0x2ed/0x750 fs/file_table.c:281
task_work_run+0x147/0x1d0 kernel/task_work.c:123
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_usermode_loop arch/x86/entry/common.c:165 [inline]
prepare_exit_to_usermode+0x48e/0x600 arch/x86/entry/common.c:196

What led to this crash was an injected memory allocation failure in
media_request_alloc():

FAULT_INJECTION: forcing a failure.
name failslab, interval 1, probability 0, space 0, times 0
should_failslab+0x5/0x20
kmem_cache_alloc_trace+0x57/0x300
? anon_inode_getfile+0xe5/0x170
media_request_alloc+0x339/0x440
media_device_request_alloc+0x94/0xc0
media_device_ioctl+0x1fb/0x330
? do_vfs_ioctl+0x6ea/0x1a00
? media_ioctl+0x101/0x120
? __media_device_usb_init+0x430/0x430
? media_poll+0x110/0x110
__se_sys_ioctl+0xf9/0x160
do_syscall_64+0xf3/0x1b0

When that allocation fails, filp->private_data is left uninitialized
which media_request_close() does not expect and crashes.

To avoid this, reorder media_request_alloc() such that
allocating the struct file happens as the last step thus
media_request_close() will no longer get called for a partially created
media request.

Reported-by: syzbot+6bed2d543cf7e48b822b@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi>
Fixes: 10905d70d788 ("media: media-request: implement media requests")
Reviewed-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
diff e30cc79c Sun Jun 21 05:30:40 MDT 2020 Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi> media: media-request: Fix crash if memory allocation fails

Syzbot reports a NULL-ptr deref in the kref_put() call:

BUG: KASAN: null-ptr-deref in media_request_put drivers/media/mc/mc-request.c:81 [inline]
kref_put include/linux/kref.h:64 [inline]
media_request_put drivers/media/mc/mc-request.c:81 [inline]
media_request_close+0x4d/0x170 drivers/media/mc/mc-request.c:89
__fput+0x2ed/0x750 fs/file_table.c:281
task_work_run+0x147/0x1d0 kernel/task_work.c:123
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_usermode_loop arch/x86/entry/common.c:165 [inline]
prepare_exit_to_usermode+0x48e/0x600 arch/x86/entry/common.c:196

What led to this crash was an injected memory allocation failure in
media_request_alloc():

FAULT_INJECTION: forcing a failure.
name failslab, interval 1, probability 0, space 0, times 0
should_failslab+0x5/0x20
kmem_cache_alloc_trace+0x57/0x300
? anon_inode_getfile+0xe5/0x170
media_request_alloc+0x339/0x440
media_device_request_alloc+0x94/0xc0
media_device_ioctl+0x1fb/0x330
? do_vfs_ioctl+0x6ea/0x1a00
? media_ioctl+0x101/0x120
? __media_device_usb_init+0x430/0x430
? media_poll+0x110/0x110
__se_sys_ioctl+0xf9/0x160
do_syscall_64+0xf3/0x1b0

When that allocation fails, filp->private_data is left uninitialized
which media_request_close() does not expect and crashes.

To avoid this, reorder media_request_alloc() such that
allocating the struct file happens as the last step thus
media_request_close() will no longer get called for a partially created
media request.

Reported-by: syzbot+6bed2d543cf7e48b822b@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi>
Fixes: 10905d70d788 ("media: media-request: implement media requests")
Reviewed-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
diff e30cc79c Sun Jun 21 05:30:40 MDT 2020 Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi> media: media-request: Fix crash if memory allocation fails

Syzbot reports a NULL-ptr deref in the kref_put() call:

BUG: KASAN: null-ptr-deref in media_request_put drivers/media/mc/mc-request.c:81 [inline]
kref_put include/linux/kref.h:64 [inline]
media_request_put drivers/media/mc/mc-request.c:81 [inline]
media_request_close+0x4d/0x170 drivers/media/mc/mc-request.c:89
__fput+0x2ed/0x750 fs/file_table.c:281
task_work_run+0x147/0x1d0 kernel/task_work.c:123
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_usermode_loop arch/x86/entry/common.c:165 [inline]
prepare_exit_to_usermode+0x48e/0x600 arch/x86/entry/common.c:196

What led to this crash was an injected memory allocation failure in
media_request_alloc():

FAULT_INJECTION: forcing a failure.
name failslab, interval 1, probability 0, space 0, times 0
should_failslab+0x5/0x20
kmem_cache_alloc_trace+0x57/0x300
? anon_inode_getfile+0xe5/0x170
media_request_alloc+0x339/0x440
media_device_request_alloc+0x94/0xc0
media_device_ioctl+0x1fb/0x330
? do_vfs_ioctl+0x6ea/0x1a00
? media_ioctl+0x101/0x120
? __media_device_usb_init+0x430/0x430
? media_poll+0x110/0x110
__se_sys_ioctl+0xf9/0x160
do_syscall_64+0xf3/0x1b0

When that allocation fails, filp->private_data is left uninitialized
which media_request_close() does not expect and crashes.

To avoid this, reorder media_request_alloc() such that
allocating the struct file happens as the last step thus
media_request_close() will no longer get called for a partially created
media request.

Reported-by: syzbot+6bed2d543cf7e48b822b@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi>
Fixes: 10905d70d788 ("media: media-request: implement media requests")
Reviewed-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
diff e30cc79c Sun Jun 21 05:30:40 MDT 2020 Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi> media: media-request: Fix crash if memory allocation fails

Syzbot reports a NULL-ptr deref in the kref_put() call:

BUG: KASAN: null-ptr-deref in media_request_put drivers/media/mc/mc-request.c:81 [inline]
kref_put include/linux/kref.h:64 [inline]
media_request_put drivers/media/mc/mc-request.c:81 [inline]
media_request_close+0x4d/0x170 drivers/media/mc/mc-request.c:89
__fput+0x2ed/0x750 fs/file_table.c:281
task_work_run+0x147/0x1d0 kernel/task_work.c:123
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_usermode_loop arch/x86/entry/common.c:165 [inline]
prepare_exit_to_usermode+0x48e/0x600 arch/x86/entry/common.c:196

What led to this crash was an injected memory allocation failure in
media_request_alloc():

FAULT_INJECTION: forcing a failure.
name failslab, interval 1, probability 0, space 0, times 0
should_failslab+0x5/0x20
kmem_cache_alloc_trace+0x57/0x300
? anon_inode_getfile+0xe5/0x170
media_request_alloc+0x339/0x440
media_device_request_alloc+0x94/0xc0
media_device_ioctl+0x1fb/0x330
? do_vfs_ioctl+0x6ea/0x1a00
? media_ioctl+0x101/0x120
? __media_device_usb_init+0x430/0x430
? media_poll+0x110/0x110
__se_sys_ioctl+0xf9/0x160
do_syscall_64+0xf3/0x1b0

When that allocation fails, filp->private_data is left uninitialized
which media_request_close() does not expect and crashes.

To avoid this, reorder media_request_alloc() such that
allocating the struct file happens as the last step thus
media_request_close() will no longer get called for a partially created
media request.

Reported-by: syzbot+6bed2d543cf7e48b822b@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi>
Fixes: 10905d70d788 ("media: media-request: implement media requests")
Reviewed-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
diff e30cc79c Sun Jun 21 05:30:40 MDT 2020 Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi> media: media-request: Fix crash if memory allocation fails

Syzbot reports a NULL-ptr deref in the kref_put() call:

BUG: KASAN: null-ptr-deref in media_request_put drivers/media/mc/mc-request.c:81 [inline]
kref_put include/linux/kref.h:64 [inline]
media_request_put drivers/media/mc/mc-request.c:81 [inline]
media_request_close+0x4d/0x170 drivers/media/mc/mc-request.c:89
__fput+0x2ed/0x750 fs/file_table.c:281
task_work_run+0x147/0x1d0 kernel/task_work.c:123
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_usermode_loop arch/x86/entry/common.c:165 [inline]
prepare_exit_to_usermode+0x48e/0x600 arch/x86/entry/common.c:196

What led to this crash was an injected memory allocation failure in
media_request_alloc():

FAULT_INJECTION: forcing a failure.
name failslab, interval 1, probability 0, space 0, times 0
should_failslab+0x5/0x20
kmem_cache_alloc_trace+0x57/0x300
? anon_inode_getfile+0xe5/0x170
media_request_alloc+0x339/0x440
media_device_request_alloc+0x94/0xc0
media_device_ioctl+0x1fb/0x330
? do_vfs_ioctl+0x6ea/0x1a00
? media_ioctl+0x101/0x120
? __media_device_usb_init+0x430/0x430
? media_poll+0x110/0x110
__se_sys_ioctl+0xf9/0x160
do_syscall_64+0xf3/0x1b0

When that allocation fails, filp->private_data is left uninitialized
which media_request_close() does not expect and crashes.

To avoid this, reorder media_request_alloc() such that
allocating the struct file happens as the last step thus
media_request_close() will no longer get called for a partially created
media request.

Reported-by: syzbot+6bed2d543cf7e48b822b@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi>
Fixes: 10905d70d788 ("media: media-request: implement media requests")
Reviewed-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
diff e30cc79c Sun Jun 21 05:30:40 MDT 2020 Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi> media: media-request: Fix crash if memory allocation fails

Syzbot reports a NULL-ptr deref in the kref_put() call:

BUG: KASAN: null-ptr-deref in media_request_put drivers/media/mc/mc-request.c:81 [inline]
kref_put include/linux/kref.h:64 [inline]
media_request_put drivers/media/mc/mc-request.c:81 [inline]
media_request_close+0x4d/0x170 drivers/media/mc/mc-request.c:89
__fput+0x2ed/0x750 fs/file_table.c:281
task_work_run+0x147/0x1d0 kernel/task_work.c:123
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_usermode_loop arch/x86/entry/common.c:165 [inline]
prepare_exit_to_usermode+0x48e/0x600 arch/x86/entry/common.c:196

What led to this crash was an injected memory allocation failure in
media_request_alloc():

FAULT_INJECTION: forcing a failure.
name failslab, interval 1, probability 0, space 0, times 0
should_failslab+0x5/0x20
kmem_cache_alloc_trace+0x57/0x300
? anon_inode_getfile+0xe5/0x170
media_request_alloc+0x339/0x440
media_device_request_alloc+0x94/0xc0
media_device_ioctl+0x1fb/0x330
? do_vfs_ioctl+0x6ea/0x1a00
? media_ioctl+0x101/0x120
? __media_device_usb_init+0x430/0x430
? media_poll+0x110/0x110
__se_sys_ioctl+0xf9/0x160
do_syscall_64+0xf3/0x1b0

When that allocation fails, filp->private_data is left uninitialized
which media_request_close() does not expect and crashes.

To avoid this, reorder media_request_alloc() such that
allocating the struct file happens as the last step thus
media_request_close() will no longer get called for a partially created
media request.

Reported-by: syzbot+6bed2d543cf7e48b822b@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi>
Fixes: 10905d70d788 ("media: media-request: implement media requests")
Reviewed-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
diff e30cc79c Sun Jun 21 05:30:40 MDT 2020 Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi> media: media-request: Fix crash if memory allocation fails

Syzbot reports a NULL-ptr deref in the kref_put() call:

BUG: KASAN: null-ptr-deref in media_request_put drivers/media/mc/mc-request.c:81 [inline]
kref_put include/linux/kref.h:64 [inline]
media_request_put drivers/media/mc/mc-request.c:81 [inline]
media_request_close+0x4d/0x170 drivers/media/mc/mc-request.c:89
__fput+0x2ed/0x750 fs/file_table.c:281
task_work_run+0x147/0x1d0 kernel/task_work.c:123
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_usermode_loop arch/x86/entry/common.c:165 [inline]
prepare_exit_to_usermode+0x48e/0x600 arch/x86/entry/common.c:196

What led to this crash was an injected memory allocation failure in
media_request_alloc():

FAULT_INJECTION: forcing a failure.
name failslab, interval 1, probability 0, space 0, times 0
should_failslab+0x5/0x20
kmem_cache_alloc_trace+0x57/0x300
? anon_inode_getfile+0xe5/0x170
media_request_alloc+0x339/0x440
media_device_request_alloc+0x94/0xc0
media_device_ioctl+0x1fb/0x330
? do_vfs_ioctl+0x6ea/0x1a00
? media_ioctl+0x101/0x120
? __media_device_usb_init+0x430/0x430
? media_poll+0x110/0x110
__se_sys_ioctl+0xf9/0x160
do_syscall_64+0xf3/0x1b0

When that allocation fails, filp->private_data is left uninitialized
which media_request_close() does not expect and crashes.

To avoid this, reorder media_request_alloc() such that
allocating the struct file happens as the last step thus
media_request_close() will no longer get called for a partially created
media request.

Reported-by: syzbot+6bed2d543cf7e48b822b@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi>
Fixes: 10905d70d788 ("media: media-request: implement media requests")
Reviewed-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
diff e30cc79c Sun Jun 21 05:30:40 MDT 2020 Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi> media: media-request: Fix crash if memory allocation fails

Syzbot reports a NULL-ptr deref in the kref_put() call:

BUG: KASAN: null-ptr-deref in media_request_put drivers/media/mc/mc-request.c:81 [inline]
kref_put include/linux/kref.h:64 [inline]
media_request_put drivers/media/mc/mc-request.c:81 [inline]
media_request_close+0x4d/0x170 drivers/media/mc/mc-request.c:89
__fput+0x2ed/0x750 fs/file_table.c:281
task_work_run+0x147/0x1d0 kernel/task_work.c:123
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_usermode_loop arch/x86/entry/common.c:165 [inline]
prepare_exit_to_usermode+0x48e/0x600 arch/x86/entry/common.c:196

What led to this crash was an injected memory allocation failure in
media_request_alloc():

FAULT_INJECTION: forcing a failure.
name failslab, interval 1, probability 0, space 0, times 0
should_failslab+0x5/0x20
kmem_cache_alloc_trace+0x57/0x300
? anon_inode_getfile+0xe5/0x170
media_request_alloc+0x339/0x440
media_device_request_alloc+0x94/0xc0
media_device_ioctl+0x1fb/0x330
? do_vfs_ioctl+0x6ea/0x1a00
? media_ioctl+0x101/0x120
? __media_device_usb_init+0x430/0x430
? media_poll+0x110/0x110
__se_sys_ioctl+0xf9/0x160
do_syscall_64+0xf3/0x1b0

When that allocation fails, filp->private_data is left uninitialized
which media_request_close() does not expect and crashes.

To avoid this, reorder media_request_alloc() such that
allocating the struct file happens as the last step thus
media_request_close() will no longer get called for a partially created
media request.

Reported-by: syzbot+6bed2d543cf7e48b822b@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi>
Fixes: 10905d70d788 ("media: media-request: implement media requests")
Reviewed-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
diff e30cc79c Sun Jun 21 05:30:40 MDT 2020 Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi> media: media-request: Fix crash if memory allocation fails

Syzbot reports a NULL-ptr deref in the kref_put() call:

BUG: KASAN: null-ptr-deref in media_request_put drivers/media/mc/mc-request.c:81 [inline]
kref_put include/linux/kref.h:64 [inline]
media_request_put drivers/media/mc/mc-request.c:81 [inline]
media_request_close+0x4d/0x170 drivers/media/mc/mc-request.c:89
__fput+0x2ed/0x750 fs/file_table.c:281
task_work_run+0x147/0x1d0 kernel/task_work.c:123
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_usermode_loop arch/x86/entry/common.c:165 [inline]
prepare_exit_to_usermode+0x48e/0x600 arch/x86/entry/common.c:196

What led to this crash was an injected memory allocation failure in
media_request_alloc():

FAULT_INJECTION: forcing a failure.
name failslab, interval 1, probability 0, space 0, times 0
should_failslab+0x5/0x20
kmem_cache_alloc_trace+0x57/0x300
? anon_inode_getfile+0xe5/0x170
media_request_alloc+0x339/0x440
media_device_request_alloc+0x94/0xc0
media_device_ioctl+0x1fb/0x330
? do_vfs_ioctl+0x6ea/0x1a00
? media_ioctl+0x101/0x120
? __media_device_usb_init+0x430/0x430
? media_poll+0x110/0x110
__se_sys_ioctl+0xf9/0x160
do_syscall_64+0xf3/0x1b0

When that allocation fails, filp->private_data is left uninitialized
which media_request_close() does not expect and crashes.

To avoid this, reorder media_request_alloc() such that
allocating the struct file happens as the last step thus
media_request_close() will no longer get called for a partially created
media request.

Reported-by: syzbot+6bed2d543cf7e48b822b@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi>
Fixes: 10905d70d788 ("media: media-request: implement media requests")
Reviewed-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
diff e30cc79c Sun Jun 21 05:30:40 MDT 2020 Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi> media: media-request: Fix crash if memory allocation fails

Syzbot reports a NULL-ptr deref in the kref_put() call:

BUG: KASAN: null-ptr-deref in media_request_put drivers/media/mc/mc-request.c:81 [inline]
kref_put include/linux/kref.h:64 [inline]
media_request_put drivers/media/mc/mc-request.c:81 [inline]
media_request_close+0x4d/0x170 drivers/media/mc/mc-request.c:89
__fput+0x2ed/0x750 fs/file_table.c:281
task_work_run+0x147/0x1d0 kernel/task_work.c:123
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_usermode_loop arch/x86/entry/common.c:165 [inline]
prepare_exit_to_usermode+0x48e/0x600 arch/x86/entry/common.c:196

What led to this crash was an injected memory allocation failure in
media_request_alloc():

FAULT_INJECTION: forcing a failure.
name failslab, interval 1, probability 0, space 0, times 0
should_failslab+0x5/0x20
kmem_cache_alloc_trace+0x57/0x300
? anon_inode_getfile+0xe5/0x170
media_request_alloc+0x339/0x440
media_device_request_alloc+0x94/0xc0
media_device_ioctl+0x1fb/0x330
? do_vfs_ioctl+0x6ea/0x1a00
? media_ioctl+0x101/0x120
? __media_device_usb_init+0x430/0x430
? media_poll+0x110/0x110
__se_sys_ioctl+0xf9/0x160
do_syscall_64+0xf3/0x1b0

When that allocation fails, filp->private_data is left uninitialized
which media_request_close() does not expect and crashes.

To avoid this, reorder media_request_alloc() such that
allocating the struct file happens as the last step thus
media_request_close() will no longer get called for a partially created
media request.

Reported-by: syzbot+6bed2d543cf7e48b822b@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi>
Fixes: 10905d70d788 ("media: media-request: implement media requests")
Reviewed-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
/linux-master/tools/testing/selftests/bpf/progs/
H A Dbpf_mod_race.c46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
H A Dksym_race.c46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
H A Dkfunc_call_race.c46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
46565696 Fri Jan 14 09:39:53 MST 2022 Kumar Kartikeya Dwivedi <memxor@gmail.com> selftests/bpf: Add test for race in btf_try_get_module

This adds a complete test case to ensure we never take references to
modules not in MODULE_STATE_LIVE, which can lead to UAF, and it also
ensures we never access btf->kfunc_set_tab in an inconsistent state.

The test uses userfaultfd to artificially widen the race.

When run on an unpatched kernel, it leads to the following splat:

[root@(none) bpf]# ./test_progs -t bpf_mod_race/ksym
[ 55.498171] BUG: unable to handle page fault for address: fffffbfff802548b
[ 55.499206] #PF: supervisor read access in kernel mode
[ 55.499855] #PF: error_code(0x0000) - not-present page
[ 55.500555] PGD a4fa9067 P4D a4fa9067 PUD a4fa5067 PMD 1b44067 PTE 0
[ 55.501499] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 55.502195] CPU: 0 PID: 83 Comm: kworker/0:2 Tainted: G OE 5.16.0-rc4+ #151
[ 55.503388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 55.504777] Workqueue: events bpf_prog_free_deferred
[ 55.505563] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.509140] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.509977] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.511096] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.512143] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.513228] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.514332] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.515418] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.516705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.517560] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.518672] PKRU: 55555554
[ 55.519022] Call Trace:
[ 55.519483] <TASK>
[ 55.519884] module_put.part.0+0x2a/0x180
[ 55.520642] bpf_prog_free_deferred+0x129/0x2e0
[ 55.521478] process_one_work+0x4fa/0x9e0
[ 55.522122] ? pwq_dec_nr_in_flight+0x100/0x100
[ 55.522878] ? rwlock_bug.part.0+0x60/0x60
[ 55.523551] worker_thread+0x2eb/0x700
[ 55.524176] ? __kthread_parkme+0xd8/0xf0
[ 55.524853] ? process_one_work+0x9e0/0x9e0
[ 55.525544] kthread+0x23a/0x270
[ 55.526088] ? set_kthread_struct+0x80/0x80
[ 55.526798] ret_from_fork+0x1f/0x30
[ 55.527413] </TASK>
[ 55.527813] Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod]
[ 55.530846] CR2: fffffbfff802548b
[ 55.531341] ---[ end trace 1af41803c054ad6d ]---
[ 55.532136] RIP: 0010:kasan_check_range+0x184/0x1d0
[ 55.535887] RSP: 0018:ffff88800560fcf0 EFLAGS: 00010282
[ 55.536711] RAX: fffffbfff802548b RBX: fffffbfff802548c RCX: ffffffff9337b6ba
[ 55.537821] RDX: fffffbfff802548c RSI: 0000000000000004 RDI: ffffffffc012a458
[ 55.538899] RBP: fffffbfff802548b R08: 0000000000000001 R09: ffffffffc012a45b
[ 55.539928] R10: fffffbfff802548b R11: 0000000000000001 R12: ffff888001b5f598
[ 55.541021] R13: ffff888004f49ac8 R14: 0000000000000000 R15: ffff888092449400
[ 55.542108] FS: 0000000000000000(0000) GS:ffff888092400000(0000) knlGS:0000000000000000
[ 55.543260]CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.544136] CR2: fffffbfff802548b CR3: 0000000007c10006 CR4: 0000000000770ef0
[ 55.545317] PKRU: 55555554
[ 55.545671] note: kworker/0:2[83] exited with preempt_count 1

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-11-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
/linux-master/include/net/
H A Dipv6_frag.hdiff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff d5dd8879 Tue Jun 18 12:09:00 MDT 2019 Eric Dumazet <edumazet@google.com> inet: fix various use-after-free in defrags units

syzbot reported another issue caused by my recent patches. [1]

The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.

We need to make sure that timers can catch fqdir->dead being set,
to bailout.

Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :

1) qfdir->dead = 1;

2) netns dismantle (freeing of various data structure)

This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()

This also means we can use a regular work queue, we no
longer need rcu_work.

Tested:

$ time for i in {1..1000}; do unshare -n /bin/false;done

real 0m2.585s
user 0m0.160s
sys 0m2.214s

[1]

BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860

CPU: 1 PID: 11860 Comm: syz-executor.4 Not tainted 5.2.0-rc2+ #22
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
__kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
expire_timers kernel/time/timer.c:1366 [inline]
__run_timers kernel/time/timer.c:1685 [inline]
__run_timers kernel/time/timer.c:1653 [inline]
run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
__do_softirq+0x25c/0x94c kernel/softirq.c:293
invoke_softirq kernel/softirq.c:374 [inline]
irq_exit+0x180/0x1d0 kernel/softirq.c:414
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
</IRQ>
RIP: 0010:tomoyo_domain_quota_is_ok+0x131/0x540 security/tomoyo/util.c:1035
Code: 24 4c 3b 65 d0 0f 84 9c 00 00 00 e8 19 1d 73 fe 49 8d 7c 24 18 48 ba 00 00 00 00 00 fc ff df 48 89 f8 48 c1 e8 03 0f b6 04 10 <48> 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 69 03 00 00 41 0f b6 5c
RSP: 0018:ffff88806ae079c0 EFLAGS: 00000a02 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffc9000e655000
RDX: dffffc0000000000 RSI: ffffffff82fd88a7 RDI: ffff888086202398
RBP: ffff88806ae07a00 R08: ffff88808b6c8700 R09: ffffed100d5c0f4d
R10: ffffed100d5c0f4c R11: 0000000000000000 R12: ffff888086202380
R13: 0000000000000030 R14: 00000000000000d3 R15: 0000000000000000
tomoyo_supervisor+0x2e8/0xef0 security/tomoyo/common.c:2087
tomoyo_audit_path_number_log security/tomoyo/file.c:235 [inline]
tomoyo_path_number_perm+0x42f/0x520 security/tomoyo/file.c:734
tomoyo_file_ioctl+0x23/0x30 security/tomoyo/tomoyo.c:335
security_file_ioctl+0x77/0xc0 security/security.c:1370
ksys_ioctl+0x57/0xd0 fs/ioctl.c:711
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4592c9
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8db5e44c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004592c9
RDX: 0000000020000080 RSI: 00000000000089f1 RDI: 0000000000000006
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8db5e456d4
R13: 00000000004cc770 R14: 00000000004d5cd8 R15: 00000000ffffffff

Allocated by task 9047:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3326 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3488
kmem_cache_zalloc include/linux/slab.h:732 [inline]
net_alloc net/core/net_namespace.c:386 [inline]
copy_net_ns+0xed/0x340 net/core/net_namespace.c:426
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x440/0x980 kernel/fork.c:2692
__do_sys_unshare kernel/fork.c:2760 [inline]
__se_sys_unshare kernel/fork.c:2758 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2541:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3432 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3698
net_free net/core/net_namespace.c:402 [inline]
net_drop_ns.part.0+0x70/0x90 net/core/net_namespace.c:409
net_drop_ns net/core/net_namespace.c:408 [inline]
cleanup_net+0x538/0x960 net/core/net_namespace.c:571
process_one_work+0x989/0x1790 kernel/workqueue.c:2269
worker_thread+0x98/0xe40 kernel/workqueue.c:2415
kthread+0x354/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808b9fe100
which belongs to the cache net_namespace of size 6784
The buggy address is located 560 bytes inside of
6784-byte region [ffff88808b9fe100, ffff88808b9ffb80)
The buggy address belongs to the page:
page:ffffea00022e7f80 refcount:1 mapcount:0 mapping:ffff88821b6f60c0 index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000256f288 ffffea0001bbef08 ffff88821b6f60c0
raw: 0000000000000000 ffff88808b9fe100 0000000100000001 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808b9fe200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88808b9fe300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808b9fe380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808b9fe400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
/linux-master/drivers/pmdomain/arm/
H A Dscmi_perf_domain.cdiff eb5555d4 Thu Jan 25 12:17:56 MST 2024 Cristian Marussi <cristian.marussi@arm.com> pmdomain: arm: Fix NULL dereference on scmi_perf_domain removal

On unloading of the scmi_perf_domain module got the below splat, when in
the DT provided to the system under test the '#power-domain-cells' property
was missing. Indeed, this particular setup causes the probe to bail out
early without giving any error, which leads to the ->remove() callback gets
to run too, but without all the expected initialized structures in place.

Add a check and bail out early on remove too.

Call trace:
scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]
scmi_dev_remove+0x28/0x40 [scmi_core]
device_remove+0x54/0x90
device_release_driver_internal+0x1dc/0x240
driver_detach+0x58/0xa8
bus_remove_driver+0x78/0x108
driver_unregister+0x38/0x70
scmi_driver_unregister+0x28/0x180 [scmi_core]
scmi_perf_domain_driver_exit+0x18/0xb78 [scmi_perf_domain]
__arm64_sys_delete_module+0x1a8/0x2c0
invoke_syscall+0x50/0x128
el0_svc_common.constprop.0+0x48/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x34/0xb8
el0t_64_sync_handler+0x100/0x130
el0t_64_sync+0x190/0x198
Code: a90153f3 f9403c14 f9414800 955f8a05 (b9400a80)
---[ end trace 0000000000000000 ]---

Fixes: 2af23ceb8624 ("pmdomain: arm: Add the SCMI performance domain")
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240125191756.868860-1-cristian.marussi@arm.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
diff eb5555d4 Thu Jan 25 12:17:56 MST 2024 Cristian Marussi <cristian.marussi@arm.com> pmdomain: arm: Fix NULL dereference on scmi_perf_domain removal

On unloading of the scmi_perf_domain module got the below splat, when in
the DT provided to the system under test the '#power-domain-cells' property
was missing. Indeed, this particular setup causes the probe to bail out
early without giving any error, which leads to the ->remove() callback gets
to run too, but without all the expected initialized structures in place.

Add a check and bail out early on remove too.

Call trace:
scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]
scmi_dev_remove+0x28/0x40 [scmi_core]
device_remove+0x54/0x90
device_release_driver_internal+0x1dc/0x240
driver_detach+0x58/0xa8
bus_remove_driver+0x78/0x108
driver_unregister+0x38/0x70
scmi_driver_unregister+0x28/0x180 [scmi_core]
scmi_perf_domain_driver_exit+0x18/0xb78 [scmi_perf_domain]
__arm64_sys_delete_module+0x1a8/0x2c0
invoke_syscall+0x50/0x128
el0_svc_common.constprop.0+0x48/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x34/0xb8
el0t_64_sync_handler+0x100/0x130
el0t_64_sync+0x190/0x198
Code: a90153f3 f9403c14 f9414800 955f8a05 (b9400a80)
---[ end trace 0000000000000000 ]---

Fixes: 2af23ceb8624 ("pmdomain: arm: Add the SCMI performance domain")
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240125191756.868860-1-cristian.marussi@arm.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
diff eb5555d4 Thu Jan 25 12:17:56 MST 2024 Cristian Marussi <cristian.marussi@arm.com> pmdomain: arm: Fix NULL dereference on scmi_perf_domain removal

On unloading of the scmi_perf_domain module got the below splat, when in
the DT provided to the system under test the '#power-domain-cells' property
was missing. Indeed, this particular setup causes the probe to bail out
early without giving any error, which leads to the ->remove() callback gets
to run too, but without all the expected initialized structures in place.

Add a check and bail out early on remove too.

Call trace:
scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]
scmi_dev_remove+0x28/0x40 [scmi_core]
device_remove+0x54/0x90
device_release_driver_internal+0x1dc/0x240
driver_detach+0x58/0xa8
bus_remove_driver+0x78/0x108
driver_unregister+0x38/0x70
scmi_driver_unregister+0x28/0x180 [scmi_core]
scmi_perf_domain_driver_exit+0x18/0xb78 [scmi_perf_domain]
__arm64_sys_delete_module+0x1a8/0x2c0
invoke_syscall+0x50/0x128
el0_svc_common.constprop.0+0x48/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x34/0xb8
el0t_64_sync_handler+0x100/0x130
el0t_64_sync+0x190/0x198
Code: a90153f3 f9403c14 f9414800 955f8a05 (b9400a80)
---[ end trace 0000000000000000 ]---

Fixes: 2af23ceb8624 ("pmdomain: arm: Add the SCMI performance domain")
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240125191756.868860-1-cristian.marussi@arm.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
diff eb5555d4 Thu Jan 25 12:17:56 MST 2024 Cristian Marussi <cristian.marussi@arm.com> pmdomain: arm: Fix NULL dereference on scmi_perf_domain removal

On unloading of the scmi_perf_domain module got the below splat, when in
the DT provided to the system under test the '#power-domain-cells' property
was missing. Indeed, this particular setup causes the probe to bail out
early without giving any error, which leads to the ->remove() callback gets
to run too, but without all the expected initialized structures in place.

Add a check and bail out early on remove too.

Call trace:
scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]
scmi_dev_remove+0x28/0x40 [scmi_core]
device_remove+0x54/0x90
device_release_driver_internal+0x1dc/0x240
driver_detach+0x58/0xa8
bus_remove_driver+0x78/0x108
driver_unregister+0x38/0x70
scmi_driver_unregister+0x28/0x180 [scmi_core]
scmi_perf_domain_driver_exit+0x18/0xb78 [scmi_perf_domain]
__arm64_sys_delete_module+0x1a8/0x2c0
invoke_syscall+0x50/0x128
el0_svc_common.constprop.0+0x48/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x34/0xb8
el0t_64_sync_handler+0x100/0x130
el0t_64_sync+0x190/0x198
Code: a90153f3 f9403c14 f9414800 955f8a05 (b9400a80)
---[ end trace 0000000000000000 ]---

Fixes: 2af23ceb8624 ("pmdomain: arm: Add the SCMI performance domain")
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240125191756.868860-1-cristian.marussi@arm.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
diff eb5555d4 Thu Jan 25 12:17:56 MST 2024 Cristian Marussi <cristian.marussi@arm.com> pmdomain: arm: Fix NULL dereference on scmi_perf_domain removal

On unloading of the scmi_perf_domain module got the below splat, when in
the DT provided to the system under test the '#power-domain-cells' property
was missing. Indeed, this particular setup causes the probe to bail out
early without giving any error, which leads to the ->remove() callback gets
to run too, but without all the expected initialized structures in place.

Add a check and bail out early on remove too.

Call trace:
scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]
scmi_dev_remove+0x28/0x40 [scmi_core]
device_remove+0x54/0x90
device_release_driver_internal+0x1dc/0x240
driver_detach+0x58/0xa8
bus_remove_driver+0x78/0x108
driver_unregister+0x38/0x70
scmi_driver_unregister+0x28/0x180 [scmi_core]
scmi_perf_domain_driver_exit+0x18/0xb78 [scmi_perf_domain]
__arm64_sys_delete_module+0x1a8/0x2c0
invoke_syscall+0x50/0x128
el0_svc_common.constprop.0+0x48/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x34/0xb8
el0t_64_sync_handler+0x100/0x130
el0t_64_sync+0x190/0x198
Code: a90153f3 f9403c14 f9414800 955f8a05 (b9400a80)
---[ end trace 0000000000000000 ]---

Fixes: 2af23ceb8624 ("pmdomain: arm: Add the SCMI performance domain")
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240125191756.868860-1-cristian.marussi@arm.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
diff eb5555d4 Thu Jan 25 12:17:56 MST 2024 Cristian Marussi <cristian.marussi@arm.com> pmdomain: arm: Fix NULL dereference on scmi_perf_domain removal

On unloading of the scmi_perf_domain module got the below splat, when in
the DT provided to the system under test the '#power-domain-cells' property
was missing. Indeed, this particular setup causes the probe to bail out
early without giving any error, which leads to the ->remove() callback gets
to run too, but without all the expected initialized structures in place.

Add a check and bail out early on remove too.

Call trace:
scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]
scmi_dev_remove+0x28/0x40 [scmi_core]
device_remove+0x54/0x90
device_release_driver_internal+0x1dc/0x240
driver_detach+0x58/0xa8
bus_remove_driver+0x78/0x108
driver_unregister+0x38/0x70
scmi_driver_unregister+0x28/0x180 [scmi_core]
scmi_perf_domain_driver_exit+0x18/0xb78 [scmi_perf_domain]
__arm64_sys_delete_module+0x1a8/0x2c0
invoke_syscall+0x50/0x128
el0_svc_common.constprop.0+0x48/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x34/0xb8
el0t_64_sync_handler+0x100/0x130
el0t_64_sync+0x190/0x198
Code: a90153f3 f9403c14 f9414800 955f8a05 (b9400a80)
---[ end trace 0000000000000000 ]---

Fixes: 2af23ceb8624 ("pmdomain: arm: Add the SCMI performance domain")
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240125191756.868860-1-cristian.marussi@arm.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
diff eb5555d4 Thu Jan 25 12:17:56 MST 2024 Cristian Marussi <cristian.marussi@arm.com> pmdomain: arm: Fix NULL dereference on scmi_perf_domain removal

On unloading of the scmi_perf_domain module got the below splat, when in
the DT provided to the system under test the '#power-domain-cells' property
was missing. Indeed, this particular setup causes the probe to bail out
early without giving any error, which leads to the ->remove() callback gets
to run too, but without all the expected initialized structures in place.

Add a check and bail out early on remove too.

Call trace:
scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]
scmi_dev_remove+0x28/0x40 [scmi_core]
device_remove+0x54/0x90
device_release_driver_internal+0x1dc/0x240
driver_detach+0x58/0xa8
bus_remove_driver+0x78/0x108
driver_unregister+0x38/0x70
scmi_driver_unregister+0x28/0x180 [scmi_core]
scmi_perf_domain_driver_exit+0x18/0xb78 [scmi_perf_domain]
__arm64_sys_delete_module+0x1a8/0x2c0
invoke_syscall+0x50/0x128
el0_svc_common.constprop.0+0x48/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x34/0xb8
el0t_64_sync_handler+0x100/0x130
el0t_64_sync+0x190/0x198
Code: a90153f3 f9403c14 f9414800 955f8a05 (b9400a80)
---[ end trace 0000000000000000 ]---

Fixes: 2af23ceb8624 ("pmdomain: arm: Add the SCMI performance domain")
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240125191756.868860-1-cristian.marussi@arm.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
diff eb5555d4 Thu Jan 25 12:17:56 MST 2024 Cristian Marussi <cristian.marussi@arm.com> pmdomain: arm: Fix NULL dereference on scmi_perf_domain removal

On unloading of the scmi_perf_domain module got the below splat, when in
the DT provided to the system under test the '#power-domain-cells' property
was missing. Indeed, this particular setup causes the probe to bail out
early without giving any error, which leads to the ->remove() callback gets
to run too, but without all the expected initialized structures in place.

Add a check and bail out early on remove too.

Call trace:
scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]
scmi_dev_remove+0x28/0x40 [scmi_core]
device_remove+0x54/0x90
device_release_driver_internal+0x1dc/0x240
driver_detach+0x58/0xa8
bus_remove_driver+0x78/0x108
driver_unregister+0x38/0x70
scmi_driver_unregister+0x28/0x180 [scmi_core]
scmi_perf_domain_driver_exit+0x18/0xb78 [scmi_perf_domain]
__arm64_sys_delete_module+0x1a8/0x2c0
invoke_syscall+0x50/0x128
el0_svc_common.constprop.0+0x48/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x34/0xb8
el0t_64_sync_handler+0x100/0x130
el0t_64_sync+0x190/0x198
Code: a90153f3 f9403c14 f9414800 955f8a05 (b9400a80)
---[ end trace 0000000000000000 ]---

Fixes: 2af23ceb8624 ("pmdomain: arm: Add the SCMI performance domain")
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240125191756.868860-1-cristian.marussi@arm.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
diff eb5555d4 Thu Jan 25 12:17:56 MST 2024 Cristian Marussi <cristian.marussi@arm.com> pmdomain: arm: Fix NULL dereference on scmi_perf_domain removal

On unloading of the scmi_perf_domain module got the below splat, when in
the DT provided to the system under test the '#power-domain-cells' property
was missing. Indeed, this particular setup causes the probe to bail out
early without giving any error, which leads to the ->remove() callback gets
to run too, but without all the expected initialized structures in place.

Add a check and bail out early on remove too.

Call trace:
scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]
scmi_dev_remove+0x28/0x40 [scmi_core]
device_remove+0x54/0x90
device_release_driver_internal+0x1dc/0x240
driver_detach+0x58/0xa8
bus_remove_driver+0x78/0x108
driver_unregister+0x38/0x70
scmi_driver_unregister+0x28/0x180 [scmi_core]
scmi_perf_domain_driver_exit+0x18/0xb78 [scmi_perf_domain]
__arm64_sys_delete_module+0x1a8/0x2c0
invoke_syscall+0x50/0x128
el0_svc_common.constprop.0+0x48/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x34/0xb8
el0t_64_sync_handler+0x100/0x130
el0t_64_sync+0x190/0x198
Code: a90153f3 f9403c14 f9414800 955f8a05 (b9400a80)
---[ end trace 0000000000000000 ]---

Fixes: 2af23ceb8624 ("pmdomain: arm: Add the SCMI performance domain")
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240125191756.868860-1-cristian.marussi@arm.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
diff eb5555d4 Thu Jan 25 12:17:56 MST 2024 Cristian Marussi <cristian.marussi@arm.com> pmdomain: arm: Fix NULL dereference on scmi_perf_domain removal

On unloading of the scmi_perf_domain module got the below splat, when in
the DT provided to the system under test the '#power-domain-cells' property
was missing. Indeed, this particular setup causes the probe to bail out
early without giving any error, which leads to the ->remove() callback gets
to run too, but without all the expected initialized structures in place.

Add a check and bail out early on remove too.

Call trace:
scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]
scmi_dev_remove+0x28/0x40 [scmi_core]
device_remove+0x54/0x90
device_release_driver_internal+0x1dc/0x240
driver_detach+0x58/0xa8
bus_remove_driver+0x78/0x108
driver_unregister+0x38/0x70
scmi_driver_unregister+0x28/0x180 [scmi_core]
scmi_perf_domain_driver_exit+0x18/0xb78 [scmi_perf_domain]
__arm64_sys_delete_module+0x1a8/0x2c0
invoke_syscall+0x50/0x128
el0_svc_common.constprop.0+0x48/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x34/0xb8
el0t_64_sync_handler+0x100/0x130
el0t_64_sync+0x190/0x198
Code: a90153f3 f9403c14 f9414800 955f8a05 (b9400a80)
---[ end trace 0000000000000000 ]---

Fixes: 2af23ceb8624 ("pmdomain: arm: Add the SCMI performance domain")
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240125191756.868860-1-cristian.marussi@arm.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
diff eb5555d4 Thu Jan 25 12:17:56 MST 2024 Cristian Marussi <cristian.marussi@arm.com> pmdomain: arm: Fix NULL dereference on scmi_perf_domain removal

On unloading of the scmi_perf_domain module got the below splat, when in
the DT provided to the system under test the '#power-domain-cells' property
was missing. Indeed, this particular setup causes the probe to bail out
early without giving any error, which leads to the ->remove() callback gets
to run too, but without all the expected initialized structures in place.

Add a check and bail out early on remove too.

Call trace:
scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]
scmi_dev_remove+0x28/0x40 [scmi_core]
device_remove+0x54/0x90
device_release_driver_internal+0x1dc/0x240
driver_detach+0x58/0xa8
bus_remove_driver+0x78/0x108
driver_unregister+0x38/0x70
scmi_driver_unregister+0x28/0x180 [scmi_core]
scmi_perf_domain_driver_exit+0x18/0xb78 [scmi_perf_domain]
__arm64_sys_delete_module+0x1a8/0x2c0
invoke_syscall+0x50/0x128
el0_svc_common.constprop.0+0x48/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x34/0xb8
el0t_64_sync_handler+0x100/0x130
el0t_64_sync+0x190/0x198
Code: a90153f3 f9403c14 f9414800 955f8a05 (b9400a80)
---[ end trace 0000000000000000 ]---

Fixes: 2af23ceb8624 ("pmdomain: arm: Add the SCMI performance domain")
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240125191756.868860-1-cristian.marussi@arm.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
diff eb5555d4 Thu Jan 25 12:17:56 MST 2024 Cristian Marussi <cristian.marussi@arm.com> pmdomain: arm: Fix NULL dereference on scmi_perf_domain removal

On unloading of the scmi_perf_domain module got the below splat, when in
the DT provided to the system under test the '#power-domain-cells' property
was missing. Indeed, this particular setup causes the probe to bail out
early without giving any error, which leads to the ->remove() callback gets
to run too, but without all the expected initialized structures in place.

Add a check and bail out early on remove too.

Call trace:
scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]
scmi_dev_remove+0x28/0x40 [scmi_core]
device_remove+0x54/0x90
device_release_driver_internal+0x1dc/0x240
driver_detach+0x58/0xa8
bus_remove_driver+0x78/0x108
driver_unregister+0x38/0x70
scmi_driver_unregister+0x28/0x180 [scmi_core]
scmi_perf_domain_driver_exit+0x18/0xb78 [scmi_perf_domain]
__arm64_sys_delete_module+0x1a8/0x2c0
invoke_syscall+0x50/0x128
el0_svc_common.constprop.0+0x48/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x34/0xb8
el0t_64_sync_handler+0x100/0x130
el0t_64_sync+0x190/0x198
Code: a90153f3 f9403c14 f9414800 955f8a05 (b9400a80)
---[ end trace 0000000000000000 ]---

Fixes: 2af23ceb8624 ("pmdomain: arm: Add the SCMI performance domain")
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240125191756.868860-1-cristian.marussi@arm.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
diff eb5555d4 Thu Jan 25 12:17:56 MST 2024 Cristian Marussi <cristian.marussi@arm.com> pmdomain: arm: Fix NULL dereference on scmi_perf_domain removal

On unloading of the scmi_perf_domain module got the below splat, when in
the DT provided to the system under test the '#power-domain-cells' property
was missing. Indeed, this particular setup causes the probe to bail out
early without giving any error, which leads to the ->remove() callback gets
to run too, but without all the expected initialized structures in place.

Add a check and bail out early on remove too.

Call trace:
scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]
scmi_dev_remove+0x28/0x40 [scmi_core]
device_remove+0x54/0x90
device_release_driver_internal+0x1dc/0x240
driver_detach+0x58/0xa8
bus_remove_driver+0x78/0x108
driver_unregister+0x38/0x70
scmi_driver_unregister+0x28/0x180 [scmi_core]
scmi_perf_domain_driver_exit+0x18/0xb78 [scmi_perf_domain]
__arm64_sys_delete_module+0x1a8/0x2c0
invoke_syscall+0x50/0x128
el0_svc_common.constprop.0+0x48/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x34/0xb8
el0t_64_sync_handler+0x100/0x130
el0t_64_sync+0x190/0x198
Code: a90153f3 f9403c14 f9414800 955f8a05 (b9400a80)
---[ end trace 0000000000000000 ]---

Fixes: 2af23ceb8624 ("pmdomain: arm: Add the SCMI performance domain")
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240125191756.868860-1-cristian.marussi@arm.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
diff eb5555d4 Thu Jan 25 12:17:56 MST 2024 Cristian Marussi <cristian.marussi@arm.com> pmdomain: arm: Fix NULL dereference on scmi_perf_domain removal

On unloading of the scmi_perf_domain module got the below splat, when in
the DT provided to the system under test the '#power-domain-cells' property
was missing. Indeed, this particular setup causes the probe to bail out
early without giving any error, which leads to the ->remove() callback gets
to run too, but without all the expected initialized structures in place.

Add a check and bail out early on remove too.

Call trace:
scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]
scmi_dev_remove+0x28/0x40 [scmi_core]
device_remove+0x54/0x90
device_release_driver_internal+0x1dc/0x240
driver_detach+0x58/0xa8
bus_remove_driver+0x78/0x108
driver_unregister+0x38/0x70
scmi_driver_unregister+0x28/0x180 [scmi_core]
scmi_perf_domain_driver_exit+0x18/0xb78 [scmi_perf_domain]
__arm64_sys_delete_module+0x1a8/0x2c0
invoke_syscall+0x50/0x128
el0_svc_common.constprop.0+0x48/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x34/0xb8
el0t_64_sync_handler+0x100/0x130
el0t_64_sync+0x190/0x198
Code: a90153f3 f9403c14 f9414800 955f8a05 (b9400a80)
---[ end trace 0000000000000000 ]---

Fixes: 2af23ceb8624 ("pmdomain: arm: Add the SCMI performance domain")
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240125191756.868860-1-cristian.marussi@arm.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
diff eb5555d4 Thu Jan 25 12:17:56 MST 2024 Cristian Marussi <cristian.marussi@arm.com> pmdomain: arm: Fix NULL dereference on scmi_perf_domain removal

On unloading of the scmi_perf_domain module got the below splat, when in
the DT provided to the system under test the '#power-domain-cells' property
was missing. Indeed, this particular setup causes the probe to bail out
early without giving any error, which leads to the ->remove() callback gets
to run too, but without all the expected initialized structures in place.

Add a check and bail out early on remove too.

Call trace:
scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]
scmi_dev_remove+0x28/0x40 [scmi_core]
device_remove+0x54/0x90
device_release_driver_internal+0x1dc/0x240
driver_detach+0x58/0xa8
bus_remove_driver+0x78/0x108
driver_unregister+0x38/0x70
scmi_driver_unregister+0x28/0x180 [scmi_core]
scmi_perf_domain_driver_exit+0x18/0xb78 [scmi_perf_domain]
__arm64_sys_delete_module+0x1a8/0x2c0
invoke_syscall+0x50/0x128
el0_svc_common.constprop.0+0x48/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x34/0xb8
el0t_64_sync_handler+0x100/0x130
el0t_64_sync+0x190/0x198
Code: a90153f3 f9403c14 f9414800 955f8a05 (b9400a80)
---[ end trace 0000000000000000 ]---

Fixes: 2af23ceb8624 ("pmdomain: arm: Add the SCMI performance domain")
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240125191756.868860-1-cristian.marussi@arm.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
diff eb5555d4 Thu Jan 25 12:17:56 MST 2024 Cristian Marussi <cristian.marussi@arm.com> pmdomain: arm: Fix NULL dereference on scmi_perf_domain removal

On unloading of the scmi_perf_domain module got the below splat, when in
the DT provided to the system under test the '#power-domain-cells' property
was missing. Indeed, this particular setup causes the probe to bail out
early without giving any error, which leads to the ->remove() callback gets
to run too, but without all the expected initialized structures in place.

Add a check and bail out early on remove too.

Call trace:
scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]
scmi_dev_remove+0x28/0x40 [scmi_core]
device_remove+0x54/0x90
device_release_driver_internal+0x1dc/0x240
driver_detach+0x58/0xa8
bus_remove_driver+0x78/0x108
driver_unregister+0x38/0x70
scmi_driver_unregister+0x28/0x180 [scmi_core]
scmi_perf_domain_driver_exit+0x18/0xb78 [scmi_perf_domain]
__arm64_sys_delete_module+0x1a8/0x2c0
invoke_syscall+0x50/0x128
el0_svc_common.constprop.0+0x48/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x34/0xb8
el0t_64_sync_handler+0x100/0x130
el0t_64_sync+0x190/0x198
Code: a90153f3 f9403c14 f9414800 955f8a05 (b9400a80)
---[ end trace 0000000000000000 ]---

Fixes: 2af23ceb8624 ("pmdomain: arm: Add the SCMI performance domain")
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240125191756.868860-1-cristian.marussi@arm.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
diff eb5555d4 Thu Jan 25 12:17:56 MST 2024 Cristian Marussi <cristian.marussi@arm.com> pmdomain: arm: Fix NULL dereference on scmi_perf_domain removal

On unloading of the scmi_perf_domain module got the below splat, when in
the DT provided to the system under test the '#power-domain-cells' property
was missing. Indeed, this particular setup causes the probe to bail out
early without giving any error, which leads to the ->remove() callback gets
to run too, but without all the expected initialized structures in place.

Add a check and bail out early on remove too.

Call trace:
scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]
scmi_dev_remove+0x28/0x40 [scmi_core]
device_remove+0x54/0x90
device_release_driver_internal+0x1dc/0x240
driver_detach+0x58/0xa8
bus_remove_driver+0x78/0x108
driver_unregister+0x38/0x70
scmi_driver_unregister+0x28/0x180 [scmi_core]
scmi_perf_domain_driver_exit+0x18/0xb78 [scmi_perf_domain]
__arm64_sys_delete_module+0x1a8/0x2c0
invoke_syscall+0x50/0x128
el0_svc_common.constprop.0+0x48/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x34/0xb8
el0t_64_sync_handler+0x100/0x130
el0t_64_sync+0x190/0x198
Code: a90153f3 f9403c14 f9414800 955f8a05 (b9400a80)
---[ end trace 0000000000000000 ]---

Fixes: 2af23ceb8624 ("pmdomain: arm: Add the SCMI performance domain")
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240125191756.868860-1-cristian.marussi@arm.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
diff eb5555d4 Thu Jan 25 12:17:56 MST 2024 Cristian Marussi <cristian.marussi@arm.com> pmdomain: arm: Fix NULL dereference on scmi_perf_domain removal

On unloading of the scmi_perf_domain module got the below splat, when in
the DT provided to the system under test the '#power-domain-cells' property
was missing. Indeed, this particular setup causes the probe to bail out
early without giving any error, which leads to the ->remove() callback gets
to run too, but without all the expected initialized structures in place.

Add a check and bail out early on remove too.

Call trace:
scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]
scmi_dev_remove+0x28/0x40 [scmi_core]
device_remove+0x54/0x90
device_release_driver_internal+0x1dc/0x240
driver_detach+0x58/0xa8
bus_remove_driver+0x78/0x108
driver_unregister+0x38/0x70
scmi_driver_unregister+0x28/0x180 [scmi_core]
scmi_perf_domain_driver_exit+0x18/0xb78 [scmi_perf_domain]
__arm64_sys_delete_module+0x1a8/0x2c0
invoke_syscall+0x50/0x128
el0_svc_common.constprop.0+0x48/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x34/0xb8
el0t_64_sync_handler+0x100/0x130
el0t_64_sync+0x190/0x198
Code: a90153f3 f9403c14 f9414800 955f8a05 (b9400a80)
---[ end trace 0000000000000000 ]---

Fixes: 2af23ceb8624 ("pmdomain: arm: Add the SCMI performance domain")
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240125191756.868860-1-cristian.marussi@arm.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
diff eb5555d4 Thu Jan 25 12:17:56 MST 2024 Cristian Marussi <cristian.marussi@arm.com> pmdomain: arm: Fix NULL dereference on scmi_perf_domain removal

On unloading of the scmi_perf_domain module got the below splat, when in
the DT provided to the system under test the '#power-domain-cells' property
was missing. Indeed, this particular setup causes the probe to bail out
early without giving any error, which leads to the ->remove() callback gets
to run too, but without all the expected initialized structures in place.

Add a check and bail out early on remove too.

Call trace:
scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]
scmi_dev_remove+0x28/0x40 [scmi_core]
device_remove+0x54/0x90
device_release_driver_internal+0x1dc/0x240
driver_detach+0x58/0xa8
bus_remove_driver+0x78/0x108
driver_unregister+0x38/0x70
scmi_driver_unregister+0x28/0x180 [scmi_core]
scmi_perf_domain_driver_exit+0x18/0xb78 [scmi_perf_domain]
__arm64_sys_delete_module+0x1a8/0x2c0
invoke_syscall+0x50/0x128
el0_svc_common.constprop.0+0x48/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x34/0xb8
el0t_64_sync_handler+0x100/0x130
el0t_64_sync+0x190/0x198
Code: a90153f3 f9403c14 f9414800 955f8a05 (b9400a80)
---[ end trace 0000000000000000 ]---

Fixes: 2af23ceb8624 ("pmdomain: arm: Add the SCMI performance domain")
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240125191756.868860-1-cristian.marussi@arm.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
diff eb5555d4 Thu Jan 25 12:17:56 MST 2024 Cristian Marussi <cristian.marussi@arm.com> pmdomain: arm: Fix NULL dereference on scmi_perf_domain removal

On unloading of the scmi_perf_domain module got the below splat, when in
the DT provided to the system under test the '#power-domain-cells' property
was missing. Indeed, this particular setup causes the probe to bail out
early without giving any error, which leads to the ->remove() callback gets
to run too, but without all the expected initialized structures in place.

Add a check and bail out early on remove too.

Call trace:
scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]
scmi_dev_remove+0x28/0x40 [scmi_core]
device_remove+0x54/0x90
device_release_driver_internal+0x1dc/0x240
driver_detach+0x58/0xa8
bus_remove_driver+0x78/0x108
driver_unregister+0x38/0x70
scmi_driver_unregister+0x28/0x180 [scmi_core]
scmi_perf_domain_driver_exit+0x18/0xb78 [scmi_perf_domain]
__arm64_sys_delete_module+0x1a8/0x2c0
invoke_syscall+0x50/0x128
el0_svc_common.constprop.0+0x48/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x34/0xb8
el0t_64_sync_handler+0x100/0x130
el0t_64_sync+0x190/0x198
Code: a90153f3 f9403c14 f9414800 955f8a05 (b9400a80)
---[ end trace 0000000000000000 ]---

Fixes: 2af23ceb8624 ("pmdomain: arm: Add the SCMI performance domain")
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240125191756.868860-1-cristian.marussi@arm.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
diff eb5555d4 Thu Jan 25 12:17:56 MST 2024 Cristian Marussi <cristian.marussi@arm.com> pmdomain: arm: Fix NULL dereference on scmi_perf_domain removal

On unloading of the scmi_perf_domain module got the below splat, when in
the DT provided to the system under test the '#power-domain-cells' property
was missing. Indeed, this particular setup causes the probe to bail out
early without giving any error, which leads to the ->remove() callback gets
to run too, but without all the expected initialized structures in place.

Add a check and bail out early on remove too.

Call trace:
scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]
scmi_dev_remove+0x28/0x40 [scmi_core]
device_remove+0x54/0x90
device_release_driver_internal+0x1dc/0x240
driver_detach+0x58/0xa8
bus_remove_driver+0x78/0x108
driver_unregister+0x38/0x70
scmi_driver_unregister+0x28/0x180 [scmi_core]
scmi_perf_domain_driver_exit+0x18/0xb78 [scmi_perf_domain]
__arm64_sys_delete_module+0x1a8/0x2c0
invoke_syscall+0x50/0x128
el0_svc_common.constprop.0+0x48/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x34/0xb8
el0t_64_sync_handler+0x100/0x130
el0t_64_sync+0x190/0x198
Code: a90153f3 f9403c14 f9414800 955f8a05 (b9400a80)
---[ end trace 0000000000000000 ]---

Fixes: 2af23ceb8624 ("pmdomain: arm: Add the SCMI performance domain")
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240125191756.868860-1-cristian.marussi@arm.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
diff eb5555d4 Thu Jan 25 12:17:56 MST 2024 Cristian Marussi <cristian.marussi@arm.com> pmdomain: arm: Fix NULL dereference on scmi_perf_domain removal

On unloading of the scmi_perf_domain module got the below splat, when in
the DT provided to the system under test the '#power-domain-cells' property
was missing. Indeed, this particular setup causes the probe to bail out
early without giving any error, which leads to the ->remove() callback gets
to run too, but without all the expected initialized structures in place.

Add a check and bail out early on remove too.

Call trace:
scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]
scmi_dev_remove+0x28/0x40 [scmi_core]
device_remove+0x54/0x90
device_release_driver_internal+0x1dc/0x240
driver_detach+0x58/0xa8
bus_remove_driver+0x78/0x108
driver_unregister+0x38/0x70
scmi_driver_unregister+0x28/0x180 [scmi_core]
scmi_perf_domain_driver_exit+0x18/0xb78 [scmi_perf_domain]
__arm64_sys_delete_module+0x1a8/0x2c0
invoke_syscall+0x50/0x128
el0_svc_common.constprop.0+0x48/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x34/0xb8
el0t_64_sync_handler+0x100/0x130
el0t_64_sync+0x190/0x198
Code: a90153f3 f9403c14 f9414800 955f8a05 (b9400a80)
---[ end trace 0000000000000000 ]---

Fixes: 2af23ceb8624 ("pmdomain: arm: Add the SCMI performance domain")
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240125191756.868860-1-cristian.marussi@arm.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
diff eb5555d4 Thu Jan 25 12:17:56 MST 2024 Cristian Marussi <cristian.marussi@arm.com> pmdomain: arm: Fix NULL dereference on scmi_perf_domain removal

On unloading of the scmi_perf_domain module got the below splat, when in
the DT provided to the system under test the '#power-domain-cells' property
was missing. Indeed, this particular setup causes the probe to bail out
early without giving any error, which leads to the ->remove() callback gets
to run too, but without all the expected initialized structures in place.

Add a check and bail out early on remove too.

Call trace:
scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]
scmi_dev_remove+0x28/0x40 [scmi_core]
device_remove+0x54/0x90
device_release_driver_internal+0x1dc/0x240
driver_detach+0x58/0xa8
bus_remove_driver+0x78/0x108
driver_unregister+0x38/0x70
scmi_driver_unregister+0x28/0x180 [scmi_core]
scmi_perf_domain_driver_exit+0x18/0xb78 [scmi_perf_domain]
__arm64_sys_delete_module+0x1a8/0x2c0
invoke_syscall+0x50/0x128
el0_svc_common.constprop.0+0x48/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x34/0xb8
el0t_64_sync_handler+0x100/0x130
el0t_64_sync+0x190/0x198
Code: a90153f3 f9403c14 f9414800 955f8a05 (b9400a80)
---[ end trace 0000000000000000 ]---

Fixes: 2af23ceb8624 ("pmdomain: arm: Add the SCMI performance domain")
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240125191756.868860-1-cristian.marussi@arm.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
diff eb5555d4 Thu Jan 25 12:17:56 MST 2024 Cristian Marussi <cristian.marussi@arm.com> pmdomain: arm: Fix NULL dereference on scmi_perf_domain removal

On unloading of the scmi_perf_domain module got the below splat, when in
the DT provided to the system under test the '#power-domain-cells' property
was missing. Indeed, this particular setup causes the probe to bail out
early without giving any error, which leads to the ->remove() callback gets
to run too, but without all the expected initialized structures in place.

Add a check and bail out early on remove too.

Call trace:
scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]
scmi_dev_remove+0x28/0x40 [scmi_core]
device_remove+0x54/0x90
device_release_driver_internal+0x1dc/0x240
driver_detach+0x58/0xa8
bus_remove_driver+0x78/0x108
driver_unregister+0x38/0x70
scmi_driver_unregister+0x28/0x180 [scmi_core]
scmi_perf_domain_driver_exit+0x18/0xb78 [scmi_perf_domain]
__arm64_sys_delete_module+0x1a8/0x2c0
invoke_syscall+0x50/0x128
el0_svc_common.constprop.0+0x48/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x34/0xb8
el0t_64_sync_handler+0x100/0x130
el0t_64_sync+0x190/0x198
Code: a90153f3 f9403c14 f9414800 955f8a05 (b9400a80)
---[ end trace 0000000000000000 ]---

Fixes: 2af23ceb8624 ("pmdomain: arm: Add the SCMI performance domain")
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240125191756.868860-1-cristian.marussi@arm.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
diff eb5555d4 Thu Jan 25 12:17:56 MST 2024 Cristian Marussi <cristian.marussi@arm.com> pmdomain: arm: Fix NULL dereference on scmi_perf_domain removal

On unloading of the scmi_perf_domain module got the below splat, when in
the DT provided to the system under test the '#power-domain-cells' property
was missing. Indeed, this particular setup causes the probe to bail out
early without giving any error, which leads to the ->remove() callback gets
to run too, but without all the expected initialized structures in place.

Add a check and bail out early on remove too.

Call trace:
scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]
scmi_dev_remove+0x28/0x40 [scmi_core]
device_remove+0x54/0x90
device_release_driver_internal+0x1dc/0x240
driver_detach+0x58/0xa8
bus_remove_driver+0x78/0x108
driver_unregister+0x38/0x70
scmi_driver_unregister+0x28/0x180 [scmi_core]
scmi_perf_domain_driver_exit+0x18/0xb78 [scmi_perf_domain]
__arm64_sys_delete_module+0x1a8/0x2c0
invoke_syscall+0x50/0x128
el0_svc_common.constprop.0+0x48/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x34/0xb8
el0t_64_sync_handler+0x100/0x130
el0t_64_sync+0x190/0x198
Code: a90153f3 f9403c14 f9414800 955f8a05 (b9400a80)
---[ end trace 0000000000000000 ]---

Fixes: 2af23ceb8624 ("pmdomain: arm: Add the SCMI performance domain")
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240125191756.868860-1-cristian.marussi@arm.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
diff eb5555d4 Thu Jan 25 12:17:56 MST 2024 Cristian Marussi <cristian.marussi@arm.com> pmdomain: arm: Fix NULL dereference on scmi_perf_domain removal

On unloading of the scmi_perf_domain module got the below splat, when in
the DT provided to the system under test the '#power-domain-cells' property
was missing. Indeed, this particular setup causes the probe to bail out
early without giving any error, which leads to the ->remove() callback gets
to run too, but without all the expected initialized structures in place.

Add a check and bail out early on remove too.

Call trace:
scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]
scmi_dev_remove+0x28/0x40 [scmi_core]
device_remove+0x54/0x90
device_release_driver_internal+0x1dc/0x240
driver_detach+0x58/0xa8
bus_remove_driver+0x78/0x108
driver_unregister+0x38/0x70
scmi_driver_unregister+0x28/0x180 [scmi_core]
scmi_perf_domain_driver_exit+0x18/0xb78 [scmi_perf_domain]
__arm64_sys_delete_module+0x1a8/0x2c0
invoke_syscall+0x50/0x128
el0_svc_common.constprop.0+0x48/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x34/0xb8
el0t_64_sync_handler+0x100/0x130
el0t_64_sync+0x190/0x198
Code: a90153f3 f9403c14 f9414800 955f8a05 (b9400a80)
---[ end trace 0000000000000000 ]---

Fixes: 2af23ceb8624 ("pmdomain: arm: Add the SCMI performance domain")
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240125191756.868860-1-cristian.marussi@arm.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
diff eb5555d4 Thu Jan 25 12:17:56 MST 2024 Cristian Marussi <cristian.marussi@arm.com> pmdomain: arm: Fix NULL dereference on scmi_perf_domain removal

On unloading of the scmi_perf_domain module got the below splat, when in
the DT provided to the system under test the '#power-domain-cells' property
was missing. Indeed, this particular setup causes the probe to bail out
early without giving any error, which leads to the ->remove() callback gets
to run too, but without all the expected initialized structures in place.

Add a check and bail out early on remove too.

Call trace:
scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]
scmi_dev_remove+0x28/0x40 [scmi_core]
device_remove+0x54/0x90
device_release_driver_internal+0x1dc/0x240
driver_detach+0x58/0xa8
bus_remove_driver+0x78/0x108
driver_unregister+0x38/0x70
scmi_driver_unregister+0x28/0x180 [scmi_core]
scmi_perf_domain_driver_exit+0x18/0xb78 [scmi_perf_domain]
__arm64_sys_delete_module+0x1a8/0x2c0
invoke_syscall+0x50/0x128
el0_svc_common.constprop.0+0x48/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x34/0xb8
el0t_64_sync_handler+0x100/0x130
el0t_64_sync+0x190/0x198
Code: a90153f3 f9403c14 f9414800 955f8a05 (b9400a80)
---[ end trace 0000000000000000 ]---

Fixes: 2af23ceb8624 ("pmdomain: arm: Add the SCMI performance domain")
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240125191756.868860-1-cristian.marussi@arm.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
diff eb5555d4 Thu Jan 25 12:17:56 MST 2024 Cristian Marussi <cristian.marussi@arm.com> pmdomain: arm: Fix NULL dereference on scmi_perf_domain removal

On unloading of the scmi_perf_domain module got the below splat, when in
the DT provided to the system under test the '#power-domain-cells' property
was missing. Indeed, this particular setup causes the probe to bail out
early without giving any error, which leads to the ->remove() callback gets
to run too, but without all the expected initialized structures in place.

Add a check and bail out early on remove too.

Call trace:
scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]
scmi_dev_remove+0x28/0x40 [scmi_core]
device_remove+0x54/0x90
device_release_driver_internal+0x1dc/0x240
driver_detach+0x58/0xa8
bus_remove_driver+0x78/0x108
driver_unregister+0x38/0x70
scmi_driver_unregister+0x28/0x180 [scmi_core]
scmi_perf_domain_driver_exit+0x18/0xb78 [scmi_perf_domain]
__arm64_sys_delete_module+0x1a8/0x2c0
invoke_syscall+0x50/0x128
el0_svc_common.constprop.0+0x48/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x34/0xb8
el0t_64_sync_handler+0x100/0x130
el0t_64_sync+0x190/0x198
Code: a90153f3 f9403c14 f9414800 955f8a05 (b9400a80)
---[ end trace 0000000000000000 ]---

Fixes: 2af23ceb8624 ("pmdomain: arm: Add the SCMI performance domain")
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240125191756.868860-1-cristian.marussi@arm.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
diff eb5555d4 Thu Jan 25 12:17:56 MST 2024 Cristian Marussi <cristian.marussi@arm.com> pmdomain: arm: Fix NULL dereference on scmi_perf_domain removal

On unloading of the scmi_perf_domain module got the below splat, when in
the DT provided to the system under test the '#power-domain-cells' property
was missing. Indeed, this particular setup causes the probe to bail out
early without giving any error, which leads to the ->remove() callback gets
to run too, but without all the expected initialized structures in place.

Add a check and bail out early on remove too.

Call trace:
scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]
scmi_dev_remove+0x28/0x40 [scmi_core]
device_remove+0x54/0x90
device_release_driver_internal+0x1dc/0x240
driver_detach+0x58/0xa8
bus_remove_driver+0x78/0x108
driver_unregister+0x38/0x70
scmi_driver_unregister+0x28/0x180 [scmi_core]
scmi_perf_domain_driver_exit+0x18/0xb78 [scmi_perf_domain]
__arm64_sys_delete_module+0x1a8/0x2c0
invoke_syscall+0x50/0x128
el0_svc_common.constprop.0+0x48/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x34/0xb8
el0t_64_sync_handler+0x100/0x130
el0t_64_sync+0x190/0x198
Code: a90153f3 f9403c14 f9414800 955f8a05 (b9400a80)
---[ end trace 0000000000000000 ]---

Fixes: 2af23ceb8624 ("pmdomain: arm: Add the SCMI performance domain")
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240125191756.868860-1-cristian.marussi@arm.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
diff eb5555d4 Thu Jan 25 12:17:56 MST 2024 Cristian Marussi <cristian.marussi@arm.com> pmdomain: arm: Fix NULL dereference on scmi_perf_domain removal

On unloading of the scmi_perf_domain module got the below splat, when in
the DT provided to the system under test the '#power-domain-cells' property
was missing. Indeed, this particular setup causes the probe to bail out
early without giving any error, which leads to the ->remove() callback gets
to run too, but without all the expected initialized structures in place.

Add a check and bail out early on remove too.

Call trace:
scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]
scmi_dev_remove+0x28/0x40 [scmi_core]
device_remove+0x54/0x90
device_release_driver_internal+0x1dc/0x240
driver_detach+0x58/0xa8
bus_remove_driver+0x78/0x108
driver_unregister+0x38/0x70
scmi_driver_unregister+0x28/0x180 [scmi_core]
scmi_perf_domain_driver_exit+0x18/0xb78 [scmi_perf_domain]
__arm64_sys_delete_module+0x1a8/0x2c0
invoke_syscall+0x50/0x128
el0_svc_common.constprop.0+0x48/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x34/0xb8
el0t_64_sync_handler+0x100/0x130
el0t_64_sync+0x190/0x198
Code: a90153f3 f9403c14 f9414800 955f8a05 (b9400a80)
---[ end trace 0000000000000000 ]---

Fixes: 2af23ceb8624 ("pmdomain: arm: Add the SCMI performance domain")
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240125191756.868860-1-cristian.marussi@arm.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
diff eb5555d4 Thu Jan 25 12:17:56 MST 2024 Cristian Marussi <cristian.marussi@arm.com> pmdomain: arm: Fix NULL dereference on scmi_perf_domain removal

On unloading of the scmi_perf_domain module got the below splat, when in
the DT provided to the system under test the '#power-domain-cells' property
was missing. Indeed, this particular setup causes the probe to bail out
early without giving any error, which leads to the ->remove() callback gets
to run too, but without all the expected initialized structures in place.

Add a check and bail out early on remove too.

Call trace:
scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]
scmi_dev_remove+0x28/0x40 [scmi_core]
device_remove+0x54/0x90
device_release_driver_internal+0x1dc/0x240
driver_detach+0x58/0xa8
bus_remove_driver+0x78/0x108
driver_unregister+0x38/0x70
scmi_driver_unregister+0x28/0x180 [scmi_core]
scmi_perf_domain_driver_exit+0x18/0xb78 [scmi_perf_domain]
__arm64_sys_delete_module+0x1a8/0x2c0
invoke_syscall+0x50/0x128
el0_svc_common.constprop.0+0x48/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x34/0xb8
el0t_64_sync_handler+0x100/0x130
el0t_64_sync+0x190/0x198
Code: a90153f3 f9403c14 f9414800 955f8a05 (b9400a80)
---[ end trace 0000000000000000 ]---

Fixes: 2af23ceb8624 ("pmdomain: arm: Add the SCMI performance domain")
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240125191756.868860-1-cristian.marussi@arm.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
diff eb5555d4 Thu Jan 25 12:17:56 MST 2024 Cristian Marussi <cristian.marussi@arm.com> pmdomain: arm: Fix NULL dereference on scmi_perf_domain removal

On unloading of the scmi_perf_domain module got the below splat, when in
the DT provided to the system under test the '#power-domain-cells' property
was missing. Indeed, this particular setup causes the probe to bail out
early without giving any error, which leads to the ->remove() callback gets
to run too, but without all the expected initialized structures in place.

Add a check and bail out early on remove too.

Call trace:
scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]
scmi_dev_remove+0x28/0x40 [scmi_core]
device_remove+0x54/0x90
device_release_driver_internal+0x1dc/0x240
driver_detach+0x58/0xa8
bus_remove_driver+0x78/0x108
driver_unregister+0x38/0x70
scmi_driver_unregister+0x28/0x180 [scmi_core]
scmi_perf_domain_driver_exit+0x18/0xb78 [scmi_perf_domain]
__arm64_sys_delete_module+0x1a8/0x2c0
invoke_syscall+0x50/0x128
el0_svc_common.constprop.0+0x48/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x34/0xb8
el0t_64_sync_handler+0x100/0x130
el0t_64_sync+0x190/0x198
Code: a90153f3 f9403c14 f9414800 955f8a05 (b9400a80)
---[ end trace 0000000000000000 ]---

Fixes: 2af23ceb8624 ("pmdomain: arm: Add the SCMI performance domain")
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240125191756.868860-1-cristian.marussi@arm.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
diff eb5555d4 Thu Jan 25 12:17:56 MST 2024 Cristian Marussi <cristian.marussi@arm.com> pmdomain: arm: Fix NULL dereference on scmi_perf_domain removal

On unloading of the scmi_perf_domain module got the below splat, when in
the DT provided to the system under test the '#power-domain-cells' property
was missing. Indeed, this particular setup causes the probe to bail out
early without giving any error, which leads to the ->remove() callback gets
to run too, but without all the expected initialized structures in place.

Add a check and bail out early on remove too.

Call trace:
scmi_perf_domain_remove+0x28/0x70 [scmi_perf_domain]
scmi_dev_remove+0x28/0x40 [scmi_core]
device_remove+0x54/0x90
device_release_driver_internal+0x1dc/0x240
driver_detach+0x58/0xa8
bus_remove_driver+0x78/0x108
driver_unregister+0x38/0x70
scmi_driver_unregister+0x28/0x180 [scmi_core]
scmi_perf_domain_driver_exit+0x18/0xb78 [scmi_perf_domain]
__arm64_sys_delete_module+0x1a8/0x2c0
invoke_syscall+0x50/0x128
el0_svc_common.constprop.0+0x48/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x34/0xb8
el0t_64_sync_handler+0x100/0x130
el0t_64_sync+0x190/0x198
Code: a90153f3 f9403c14 f9414800 955f8a05 (b9400a80)
---[ end trace 0000000000000000 ]---

Fixes: 2af23ceb8624 ("pmdomain: arm: Add the SCMI performance domain")
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240125191756.868860-1-cristian.marussi@arm.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
/linux-master/kernel/time/
H A Dtimekeeping_debug.cdiff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
diff f222449c Tue Feb 14 21:43:32 MST 2017 Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> timekeeping: Use deferred printk() in debug code

We cannot do printk() from tk_debug_account_sleep_time(), because
tk_debug_account_sleep_time() is called under tk_core seq lock.
The reason why printk() is unsafe there is that console_sem may
invoke scheduler (up()->wake_up_process()->activate_task()), which,
in turn, can return back to timekeeping code, for instance, via
get_time()->ktime_get(), deadlocking the system on tk_core seq lock.

[ 48.950592] ======================================================
[ 48.950622] [ INFO: possible circular locking dependency detected ]
[ 48.950622] 4.10.0-rc7-next-20170213+ #101 Not tainted
[ 48.950622] -------------------------------------------------------
[ 48.950622] kworker/0:0/3 is trying to acquire lock:
[ 48.950653] (tk_core){----..}, at: [<c01cc624>] retrigger_next_event+0x4c/0x90
[ 48.950683]
but task is already holding lock:
[ 48.950683] (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.950714]
which lock already depends on the new lock.

[ 48.950714]
the existing dependency chain (in reverse order) is:
[ 48.950714]
-> #5 (hrtimer_bases.lock){-.-...}:
[ 48.950744] _raw_spin_lock_irqsave+0x50/0x64
[ 48.950775] lock_hrtimer_base+0x28/0x58
[ 48.950775] hrtimer_start_range_ns+0x20/0x5c8
[ 48.950775] __enqueue_rt_entity+0x320/0x360
[ 48.950805] enqueue_rt_entity+0x2c/0x44
[ 48.950805] enqueue_task_rt+0x24/0x94
[ 48.950836] ttwu_do_activate+0x54/0xc0
[ 48.950836] try_to_wake_up+0x248/0x5c8
[ 48.950836] __setup_irq+0x420/0x5f0
[ 48.950836] request_threaded_irq+0xdc/0x184
[ 48.950866] devm_request_threaded_irq+0x58/0xa4
[ 48.950866] omap_i2c_probe+0x530/0x6a0
[ 48.950897] platform_drv_probe+0x50/0xb0
[ 48.950897] driver_probe_device+0x1f8/0x2cc
[ 48.950897] __driver_attach+0xc0/0xc4
[ 48.950927] bus_for_each_dev+0x6c/0xa0
[ 48.950927] bus_add_driver+0x100/0x210
[ 48.950927] driver_register+0x78/0xf4
[ 48.950958] do_one_initcall+0x3c/0x16c
[ 48.950958] kernel_init_freeable+0x20c/0x2d8
[ 48.950958] kernel_init+0x8/0x110
[ 48.950988] ret_from_fork+0x14/0x24
[ 48.950988]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[ 48.951019] _raw_spin_lock+0x40/0x50
[ 48.951019] rq_offline_rt+0x9c/0x2bc
[ 48.951019] set_rq_offline.part.2+0x2c/0x58
[ 48.951049] rq_attach_root+0x134/0x144
[ 48.951049] cpu_attach_domain+0x18c/0x6f4
[ 48.951049] build_sched_domains+0xba4/0xd80
[ 48.951080] sched_init_smp+0x68/0x10c
[ 48.951080] kernel_init_freeable+0x160/0x2d8
[ 48.951080] kernel_init+0x8/0x110
[ 48.951080] ret_from_fork+0x14/0x24
[ 48.951110]
-> #3 (&rq->lock){-.-.-.}:
[ 48.951110] _raw_spin_lock+0x40/0x50
[ 48.951141] task_fork_fair+0x30/0x124
[ 48.951141] sched_fork+0x194/0x2e0
[ 48.951141] copy_process.part.5+0x448/0x1a20
[ 48.951171] _do_fork+0x98/0x7e8
[ 48.951171] kernel_thread+0x2c/0x34
[ 48.951171] rest_init+0x1c/0x18c
[ 48.951202] start_kernel+0x35c/0x3d4
[ 48.951202] 0x8000807c
[ 48.951202]
-> #2 (&p->pi_lock){-.-.-.}:
[ 48.951232] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951232] try_to_wake_up+0x30/0x5c8
[ 48.951232] up+0x4c/0x60
[ 48.951263] __up_console_sem+0x2c/0x58
[ 48.951263] console_unlock+0x3b4/0x650
[ 48.951263] vprintk_emit+0x270/0x474
[ 48.951293] vprintk_default+0x20/0x28
[ 48.951293] printk+0x20/0x30
[ 48.951324] kauditd_hold_skb+0x94/0xb8
[ 48.951324] kauditd_thread+0x1a4/0x56c
[ 48.951324] kthread+0x104/0x148
[ 48.951354] ret_from_fork+0x14/0x24
[ 48.951354]
-> #1 ((console_sem).lock){-.....}:
[ 48.951385] _raw_spin_lock_irqsave+0x50/0x64
[ 48.951385] down_trylock+0xc/0x2c
[ 48.951385] __down_trylock_console_sem+0x24/0x80
[ 48.951385] console_trylock+0x10/0x8c
[ 48.951416] vprintk_emit+0x264/0x474
[ 48.951416] vprintk_default+0x20/0x28
[ 48.951416] printk+0x20/0x30
[ 48.951446] tk_debug_account_sleep_time+0x5c/0x70
[ 48.951446] __timekeeping_inject_sleeptime.constprop.3+0x170/0x1a0
[ 48.951446] timekeeping_resume+0x218/0x23c
[ 48.951477] syscore_resume+0x94/0x42c
[ 48.951477] suspend_enter+0x554/0x9b4
[ 48.951477] suspend_devices_and_enter+0xd8/0x4b4
[ 48.951507] enter_state+0x934/0xbd4
[ 48.951507] pm_suspend+0x14/0x70
[ 48.951507] state_store+0x68/0xc8
[ 48.951538] kernfs_fop_write+0xf4/0x1f8
[ 48.951538] __vfs_write+0x1c/0x114
[ 48.951538] vfs_write+0xa0/0x168
[ 48.951568] SyS_write+0x3c/0x90
[ 48.951568] __sys_trace_return+0x0/0x10
[ 48.951568]
-> #0 (tk_core){----..}:
[ 48.951599] lock_acquire+0xe0/0x294
[ 48.951599] ktime_get_update_offsets_now+0x5c/0x1d4
[ 48.951629] retrigger_next_event+0x4c/0x90
[ 48.951629] on_each_cpu+0x40/0x7c
[ 48.951629] clock_was_set_work+0x14/0x20
[ 48.951660] process_one_work+0x2b4/0x808
[ 48.951660] worker_thread+0x3c/0x550
[ 48.951660] kthread+0x104/0x148
[ 48.951690] ret_from_fork+0x14/0x24
[ 48.951690]
other info that might help us debug this:

[ 48.951690] Chain exists of:
tk_core --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[ 48.951721] Possible unsafe locking scenario:

[ 48.951721] CPU0 CPU1
[ 48.951721] ---- ----
[ 48.951721] lock(hrtimer_bases.lock);
[ 48.951751] lock(&rt_b->rt_runtime_lock);
[ 48.951751] lock(hrtimer_bases.lock);
[ 48.951751] lock(tk_core);
[ 48.951782]
*** DEADLOCK ***

[ 48.951782] 3 locks held by kworker/0:0/3:
[ 48.951782] #0: ("events"){.+.+.+}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951812] #1: (hrtimer_work){+.+...}, at: [<c0156590>] process_one_work+0x1f8/0x808
[ 48.951843] #2: (hrtimer_bases.lock){-.-...}, at: [<c01cc610>] retrigger_next_event+0x38/0x90
[ 48.951843] stack backtrace:
[ 48.951873] CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.10.0-rc7-next-20170213+
[ 48.951904] Workqueue: events clock_was_set_work
[ 48.951904] [<c0110208>] (unwind_backtrace) from [<c010c224>] (show_stack+0x10/0x14)
[ 48.951934] [<c010c224>] (show_stack) from [<c04ca6c0>] (dump_stack+0xac/0xe0)
[ 48.951934] [<c04ca6c0>] (dump_stack) from [<c019b5cc>] (print_circular_bug+0x1d0/0x308)
[ 48.951965] [<c019b5cc>] (print_circular_bug) from [<c019d2a8>] (validate_chain+0xf50/0x1324)
[ 48.951965] [<c019d2a8>] (validate_chain) from [<c019ec18>] (__lock_acquire+0x468/0x7e8)
[ 48.951995] [<c019ec18>] (__lock_acquire) from [<c019f634>] (lock_acquire+0xe0/0x294)
[ 48.951995] [<c019f634>] (lock_acquire) from [<c01d0ea0>] (ktime_get_update_offsets_now+0x5c/0x1d4)
[ 48.952026] [<c01d0ea0>] (ktime_get_update_offsets_now) from [<c01cc624>] (retrigger_next_event+0x4c/0x90)
[ 48.952026] [<c01cc624>] (retrigger_next_event) from [<c01e4e24>] (on_each_cpu+0x40/0x7c)
[ 48.952056] [<c01e4e24>] (on_each_cpu) from [<c01cafc4>] (clock_was_set_work+0x14/0x20)
[ 48.952056] [<c01cafc4>] (clock_was_set_work) from [<c015664c>] (process_one_work+0x2b4/0x808)
[ 48.952087] [<c015664c>] (process_one_work) from [<c0157774>] (worker_thread+0x3c/0x550)
[ 48.952087] [<c0157774>] (worker_thread) from [<c015d644>] (kthread+0x104/0x148)
[ 48.952087] [<c015d644>] (kthread) from [<c0107830>] (ret_from_fork+0x14/0x24)

Replace printk() with printk_deferred(), which does not call into
the scheduler.

Fixes: 0bf43f15db85 ("timekeeping: Prints the amounts of time spent during suspend")
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "[4.9+]" <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170215044332.30449-1-sergey.senozhatsky@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
/linux-master/drivers/dma/idxd/
H A Dperfmon.cdiff f221033f Wed Mar 13 15:40:31 MDT 2024 Fenghua Yu <fenghua.yu@intel.com> dmaengine: idxd: Fix oops during rmmod on single-CPU platforms

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

BUG: unable to handle page fault for address: 000000000002a2b8
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 1470e1067 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ #57
Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
RIP: 0010:mutex_lock+0x2e/0x50
...
Call Trace:
<TASK>
__die+0x24/0x70
page_fault_oops+0x82/0x160
do_user_addr_fault+0x65/0x6b0
__pfx___rdmsr_safe_on_cpu+0x10/0x10
exc_page_fault+0x7d/0x170
asm_exc_page_fault+0x26/0x30
mutex_lock+0x2e/0x50
mutex_lock+0x1e/0x50
perf_pmu_migrate_context+0x87/0x1f0
perf_event_cpu_offline+0x76/0x90 [idxd]
cpuhp_invoke_callback+0xa2/0x4f0
__pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
cpuhp_thread_fun+0x98/0x150
smpboot_thread_fn+0x27/0x260
smpboot_thread_fn+0x1af/0x260
__pfx_smpboot_thread_fn+0x10/0x10
kthread+0x103/0x140
__pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
__pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
<TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4d6178 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
diff f221033f Wed Mar 13 15:40:31 MDT 2024 Fenghua Yu <fenghua.yu@intel.com> dmaengine: idxd: Fix oops during rmmod on single-CPU platforms

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

BUG: unable to handle page fault for address: 000000000002a2b8
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 1470e1067 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ #57
Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
RIP: 0010:mutex_lock+0x2e/0x50
...
Call Trace:
<TASK>
__die+0x24/0x70
page_fault_oops+0x82/0x160
do_user_addr_fault+0x65/0x6b0
__pfx___rdmsr_safe_on_cpu+0x10/0x10
exc_page_fault+0x7d/0x170
asm_exc_page_fault+0x26/0x30
mutex_lock+0x2e/0x50
mutex_lock+0x1e/0x50
perf_pmu_migrate_context+0x87/0x1f0
perf_event_cpu_offline+0x76/0x90 [idxd]
cpuhp_invoke_callback+0xa2/0x4f0
__pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
cpuhp_thread_fun+0x98/0x150
smpboot_thread_fn+0x27/0x260
smpboot_thread_fn+0x1af/0x260
__pfx_smpboot_thread_fn+0x10/0x10
kthread+0x103/0x140
__pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
__pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
<TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4d6178 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
diff f221033f Wed Mar 13 15:40:31 MDT 2024 Fenghua Yu <fenghua.yu@intel.com> dmaengine: idxd: Fix oops during rmmod on single-CPU platforms

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

BUG: unable to handle page fault for address: 000000000002a2b8
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 1470e1067 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ #57
Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
RIP: 0010:mutex_lock+0x2e/0x50
...
Call Trace:
<TASK>
__die+0x24/0x70
page_fault_oops+0x82/0x160
do_user_addr_fault+0x65/0x6b0
__pfx___rdmsr_safe_on_cpu+0x10/0x10
exc_page_fault+0x7d/0x170
asm_exc_page_fault+0x26/0x30
mutex_lock+0x2e/0x50
mutex_lock+0x1e/0x50
perf_pmu_migrate_context+0x87/0x1f0
perf_event_cpu_offline+0x76/0x90 [idxd]
cpuhp_invoke_callback+0xa2/0x4f0
__pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
cpuhp_thread_fun+0x98/0x150
smpboot_thread_fn+0x27/0x260
smpboot_thread_fn+0x1af/0x260
__pfx_smpboot_thread_fn+0x10/0x10
kthread+0x103/0x140
__pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
__pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
<TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4d6178 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
diff f221033f Wed Mar 13 15:40:31 MDT 2024 Fenghua Yu <fenghua.yu@intel.com> dmaengine: idxd: Fix oops during rmmod on single-CPU platforms

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

BUG: unable to handle page fault for address: 000000000002a2b8
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 1470e1067 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ #57
Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
RIP: 0010:mutex_lock+0x2e/0x50
...
Call Trace:
<TASK>
__die+0x24/0x70
page_fault_oops+0x82/0x160
do_user_addr_fault+0x65/0x6b0
__pfx___rdmsr_safe_on_cpu+0x10/0x10
exc_page_fault+0x7d/0x170
asm_exc_page_fault+0x26/0x30
mutex_lock+0x2e/0x50
mutex_lock+0x1e/0x50
perf_pmu_migrate_context+0x87/0x1f0
perf_event_cpu_offline+0x76/0x90 [idxd]
cpuhp_invoke_callback+0xa2/0x4f0
__pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
cpuhp_thread_fun+0x98/0x150
smpboot_thread_fn+0x27/0x260
smpboot_thread_fn+0x1af/0x260
__pfx_smpboot_thread_fn+0x10/0x10
kthread+0x103/0x140
__pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
__pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
<TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4d6178 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
diff f221033f Wed Mar 13 15:40:31 MDT 2024 Fenghua Yu <fenghua.yu@intel.com> dmaengine: idxd: Fix oops during rmmod on single-CPU platforms

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

BUG: unable to handle page fault for address: 000000000002a2b8
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 1470e1067 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ #57
Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
RIP: 0010:mutex_lock+0x2e/0x50
...
Call Trace:
<TASK>
__die+0x24/0x70
page_fault_oops+0x82/0x160
do_user_addr_fault+0x65/0x6b0
__pfx___rdmsr_safe_on_cpu+0x10/0x10
exc_page_fault+0x7d/0x170
asm_exc_page_fault+0x26/0x30
mutex_lock+0x2e/0x50
mutex_lock+0x1e/0x50
perf_pmu_migrate_context+0x87/0x1f0
perf_event_cpu_offline+0x76/0x90 [idxd]
cpuhp_invoke_callback+0xa2/0x4f0
__pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
cpuhp_thread_fun+0x98/0x150
smpboot_thread_fn+0x27/0x260
smpboot_thread_fn+0x1af/0x260
__pfx_smpboot_thread_fn+0x10/0x10
kthread+0x103/0x140
__pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
__pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
<TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4d6178 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
diff f221033f Wed Mar 13 15:40:31 MDT 2024 Fenghua Yu <fenghua.yu@intel.com> dmaengine: idxd: Fix oops during rmmod on single-CPU platforms

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

BUG: unable to handle page fault for address: 000000000002a2b8
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 1470e1067 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ #57
Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
RIP: 0010:mutex_lock+0x2e/0x50
...
Call Trace:
<TASK>
__die+0x24/0x70
page_fault_oops+0x82/0x160
do_user_addr_fault+0x65/0x6b0
__pfx___rdmsr_safe_on_cpu+0x10/0x10
exc_page_fault+0x7d/0x170
asm_exc_page_fault+0x26/0x30
mutex_lock+0x2e/0x50
mutex_lock+0x1e/0x50
perf_pmu_migrate_context+0x87/0x1f0
perf_event_cpu_offline+0x76/0x90 [idxd]
cpuhp_invoke_callback+0xa2/0x4f0
__pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
cpuhp_thread_fun+0x98/0x150
smpboot_thread_fn+0x27/0x260
smpboot_thread_fn+0x1af/0x260
__pfx_smpboot_thread_fn+0x10/0x10
kthread+0x103/0x140
__pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
__pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
<TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4d6178 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
diff f221033f Wed Mar 13 15:40:31 MDT 2024 Fenghua Yu <fenghua.yu@intel.com> dmaengine: idxd: Fix oops during rmmod on single-CPU platforms

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

BUG: unable to handle page fault for address: 000000000002a2b8
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 1470e1067 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ #57
Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
RIP: 0010:mutex_lock+0x2e/0x50
...
Call Trace:
<TASK>
__die+0x24/0x70
page_fault_oops+0x82/0x160
do_user_addr_fault+0x65/0x6b0
__pfx___rdmsr_safe_on_cpu+0x10/0x10
exc_page_fault+0x7d/0x170
asm_exc_page_fault+0x26/0x30
mutex_lock+0x2e/0x50
mutex_lock+0x1e/0x50
perf_pmu_migrate_context+0x87/0x1f0
perf_event_cpu_offline+0x76/0x90 [idxd]
cpuhp_invoke_callback+0xa2/0x4f0
__pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
cpuhp_thread_fun+0x98/0x150
smpboot_thread_fn+0x27/0x260
smpboot_thread_fn+0x1af/0x260
__pfx_smpboot_thread_fn+0x10/0x10
kthread+0x103/0x140
__pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
__pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
<TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4d6178 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
diff f221033f Wed Mar 13 15:40:31 MDT 2024 Fenghua Yu <fenghua.yu@intel.com> dmaengine: idxd: Fix oops during rmmod on single-CPU platforms

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

BUG: unable to handle page fault for address: 000000000002a2b8
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 1470e1067 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ #57
Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
RIP: 0010:mutex_lock+0x2e/0x50
...
Call Trace:
<TASK>
__die+0x24/0x70
page_fault_oops+0x82/0x160
do_user_addr_fault+0x65/0x6b0
__pfx___rdmsr_safe_on_cpu+0x10/0x10
exc_page_fault+0x7d/0x170
asm_exc_page_fault+0x26/0x30
mutex_lock+0x2e/0x50
mutex_lock+0x1e/0x50
perf_pmu_migrate_context+0x87/0x1f0
perf_event_cpu_offline+0x76/0x90 [idxd]
cpuhp_invoke_callback+0xa2/0x4f0
__pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
cpuhp_thread_fun+0x98/0x150
smpboot_thread_fn+0x27/0x260
smpboot_thread_fn+0x1af/0x260
__pfx_smpboot_thread_fn+0x10/0x10
kthread+0x103/0x140
__pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
__pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
<TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4d6178 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
diff f221033f Wed Mar 13 15:40:31 MDT 2024 Fenghua Yu <fenghua.yu@intel.com> dmaengine: idxd: Fix oops during rmmod on single-CPU platforms

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

BUG: unable to handle page fault for address: 000000000002a2b8
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 1470e1067 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ #57
Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
RIP: 0010:mutex_lock+0x2e/0x50
...
Call Trace:
<TASK>
__die+0x24/0x70
page_fault_oops+0x82/0x160
do_user_addr_fault+0x65/0x6b0
__pfx___rdmsr_safe_on_cpu+0x10/0x10
exc_page_fault+0x7d/0x170
asm_exc_page_fault+0x26/0x30
mutex_lock+0x2e/0x50
mutex_lock+0x1e/0x50
perf_pmu_migrate_context+0x87/0x1f0
perf_event_cpu_offline+0x76/0x90 [idxd]
cpuhp_invoke_callback+0xa2/0x4f0
__pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
cpuhp_thread_fun+0x98/0x150
smpboot_thread_fn+0x27/0x260
smpboot_thread_fn+0x1af/0x260
__pfx_smpboot_thread_fn+0x10/0x10
kthread+0x103/0x140
__pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
__pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
<TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4d6178 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
diff f221033f Wed Mar 13 15:40:31 MDT 2024 Fenghua Yu <fenghua.yu@intel.com> dmaengine: idxd: Fix oops during rmmod on single-CPU platforms

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

BUG: unable to handle page fault for address: 000000000002a2b8
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 1470e1067 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ #57
Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
RIP: 0010:mutex_lock+0x2e/0x50
...
Call Trace:
<TASK>
__die+0x24/0x70
page_fault_oops+0x82/0x160
do_user_addr_fault+0x65/0x6b0
__pfx___rdmsr_safe_on_cpu+0x10/0x10
exc_page_fault+0x7d/0x170
asm_exc_page_fault+0x26/0x30
mutex_lock+0x2e/0x50
mutex_lock+0x1e/0x50
perf_pmu_migrate_context+0x87/0x1f0
perf_event_cpu_offline+0x76/0x90 [idxd]
cpuhp_invoke_callback+0xa2/0x4f0
__pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
cpuhp_thread_fun+0x98/0x150
smpboot_thread_fn+0x27/0x260
smpboot_thread_fn+0x1af/0x260
__pfx_smpboot_thread_fn+0x10/0x10
kthread+0x103/0x140
__pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
__pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
<TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4d6178 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
diff f221033f Wed Mar 13 15:40:31 MDT 2024 Fenghua Yu <fenghua.yu@intel.com> dmaengine: idxd: Fix oops during rmmod on single-CPU platforms

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

BUG: unable to handle page fault for address: 000000000002a2b8
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 1470e1067 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ #57
Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
RIP: 0010:mutex_lock+0x2e/0x50
...
Call Trace:
<TASK>
__die+0x24/0x70
page_fault_oops+0x82/0x160
do_user_addr_fault+0x65/0x6b0
__pfx___rdmsr_safe_on_cpu+0x10/0x10
exc_page_fault+0x7d/0x170
asm_exc_page_fault+0x26/0x30
mutex_lock+0x2e/0x50
mutex_lock+0x1e/0x50
perf_pmu_migrate_context+0x87/0x1f0
perf_event_cpu_offline+0x76/0x90 [idxd]
cpuhp_invoke_callback+0xa2/0x4f0
__pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
cpuhp_thread_fun+0x98/0x150
smpboot_thread_fn+0x27/0x260
smpboot_thread_fn+0x1af/0x260
__pfx_smpboot_thread_fn+0x10/0x10
kthread+0x103/0x140
__pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
__pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
<TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4d6178 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
diff f221033f Wed Mar 13 15:40:31 MDT 2024 Fenghua Yu <fenghua.yu@intel.com> dmaengine: idxd: Fix oops during rmmod on single-CPU platforms

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

BUG: unable to handle page fault for address: 000000000002a2b8
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 1470e1067 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ #57
Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
RIP: 0010:mutex_lock+0x2e/0x50
...
Call Trace:
<TASK>
__die+0x24/0x70
page_fault_oops+0x82/0x160
do_user_addr_fault+0x65/0x6b0
__pfx___rdmsr_safe_on_cpu+0x10/0x10
exc_page_fault+0x7d/0x170
asm_exc_page_fault+0x26/0x30
mutex_lock+0x2e/0x50
mutex_lock+0x1e/0x50
perf_pmu_migrate_context+0x87/0x1f0
perf_event_cpu_offline+0x76/0x90 [idxd]
cpuhp_invoke_callback+0xa2/0x4f0
__pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
cpuhp_thread_fun+0x98/0x150
smpboot_thread_fn+0x27/0x260
smpboot_thread_fn+0x1af/0x260
__pfx_smpboot_thread_fn+0x10/0x10
kthread+0x103/0x140
__pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
__pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
<TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4d6178 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
diff f221033f Wed Mar 13 15:40:31 MDT 2024 Fenghua Yu <fenghua.yu@intel.com> dmaengine: idxd: Fix oops during rmmod on single-CPU platforms

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

BUG: unable to handle page fault for address: 000000000002a2b8
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 1470e1067 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ #57
Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
RIP: 0010:mutex_lock+0x2e/0x50
...
Call Trace:
<TASK>
__die+0x24/0x70
page_fault_oops+0x82/0x160
do_user_addr_fault+0x65/0x6b0
__pfx___rdmsr_safe_on_cpu+0x10/0x10
exc_page_fault+0x7d/0x170
asm_exc_page_fault+0x26/0x30
mutex_lock+0x2e/0x50
mutex_lock+0x1e/0x50
perf_pmu_migrate_context+0x87/0x1f0
perf_event_cpu_offline+0x76/0x90 [idxd]
cpuhp_invoke_callback+0xa2/0x4f0
__pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
cpuhp_thread_fun+0x98/0x150
smpboot_thread_fn+0x27/0x260
smpboot_thread_fn+0x1af/0x260
__pfx_smpboot_thread_fn+0x10/0x10
kthread+0x103/0x140
__pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
__pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
<TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4d6178 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
diff f221033f Wed Mar 13 15:40:31 MDT 2024 Fenghua Yu <fenghua.yu@intel.com> dmaengine: idxd: Fix oops during rmmod on single-CPU platforms

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

BUG: unable to handle page fault for address: 000000000002a2b8
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 1470e1067 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ #57
Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
RIP: 0010:mutex_lock+0x2e/0x50
...
Call Trace:
<TASK>
__die+0x24/0x70
page_fault_oops+0x82/0x160
do_user_addr_fault+0x65/0x6b0
__pfx___rdmsr_safe_on_cpu+0x10/0x10
exc_page_fault+0x7d/0x170
asm_exc_page_fault+0x26/0x30
mutex_lock+0x2e/0x50
mutex_lock+0x1e/0x50
perf_pmu_migrate_context+0x87/0x1f0
perf_event_cpu_offline+0x76/0x90 [idxd]
cpuhp_invoke_callback+0xa2/0x4f0
__pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
cpuhp_thread_fun+0x98/0x150
smpboot_thread_fn+0x27/0x260
smpboot_thread_fn+0x1af/0x260
__pfx_smpboot_thread_fn+0x10/0x10
kthread+0x103/0x140
__pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
__pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
<TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4d6178 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
diff f221033f Wed Mar 13 15:40:31 MDT 2024 Fenghua Yu <fenghua.yu@intel.com> dmaengine: idxd: Fix oops during rmmod on single-CPU platforms

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

BUG: unable to handle page fault for address: 000000000002a2b8
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 1470e1067 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ #57
Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
RIP: 0010:mutex_lock+0x2e/0x50
...
Call Trace:
<TASK>
__die+0x24/0x70
page_fault_oops+0x82/0x160
do_user_addr_fault+0x65/0x6b0
__pfx___rdmsr_safe_on_cpu+0x10/0x10
exc_page_fault+0x7d/0x170
asm_exc_page_fault+0x26/0x30
mutex_lock+0x2e/0x50
mutex_lock+0x1e/0x50
perf_pmu_migrate_context+0x87/0x1f0
perf_event_cpu_offline+0x76/0x90 [idxd]
cpuhp_invoke_callback+0xa2/0x4f0
__pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
cpuhp_thread_fun+0x98/0x150
smpboot_thread_fn+0x27/0x260
smpboot_thread_fn+0x1af/0x260
__pfx_smpboot_thread_fn+0x10/0x10
kthread+0x103/0x140
__pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
__pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
<TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4d6178 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
diff f221033f Wed Mar 13 15:40:31 MDT 2024 Fenghua Yu <fenghua.yu@intel.com> dmaengine: idxd: Fix oops during rmmod on single-CPU platforms

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

BUG: unable to handle page fault for address: 000000000002a2b8
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 1470e1067 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ #57
Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
RIP: 0010:mutex_lock+0x2e/0x50
...
Call Trace:
<TASK>
__die+0x24/0x70
page_fault_oops+0x82/0x160
do_user_addr_fault+0x65/0x6b0
__pfx___rdmsr_safe_on_cpu+0x10/0x10
exc_page_fault+0x7d/0x170
asm_exc_page_fault+0x26/0x30
mutex_lock+0x2e/0x50
mutex_lock+0x1e/0x50
perf_pmu_migrate_context+0x87/0x1f0
perf_event_cpu_offline+0x76/0x90 [idxd]
cpuhp_invoke_callback+0xa2/0x4f0
__pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
cpuhp_thread_fun+0x98/0x150
smpboot_thread_fn+0x27/0x260
smpboot_thread_fn+0x1af/0x260
__pfx_smpboot_thread_fn+0x10/0x10
kthread+0x103/0x140
__pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
__pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
<TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4d6178 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
diff f221033f Wed Mar 13 15:40:31 MDT 2024 Fenghua Yu <fenghua.yu@intel.com> dmaengine: idxd: Fix oops during rmmod on single-CPU platforms

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

BUG: unable to handle page fault for address: 000000000002a2b8
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 1470e1067 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ #57
Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
RIP: 0010:mutex_lock+0x2e/0x50
...
Call Trace:
<TASK>
__die+0x24/0x70
page_fault_oops+0x82/0x160
do_user_addr_fault+0x65/0x6b0
__pfx___rdmsr_safe_on_cpu+0x10/0x10
exc_page_fault+0x7d/0x170
asm_exc_page_fault+0x26/0x30
mutex_lock+0x2e/0x50
mutex_lock+0x1e/0x50
perf_pmu_migrate_context+0x87/0x1f0
perf_event_cpu_offline+0x76/0x90 [idxd]
cpuhp_invoke_callback+0xa2/0x4f0
__pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
cpuhp_thread_fun+0x98/0x150
smpboot_thread_fn+0x27/0x260
smpboot_thread_fn+0x1af/0x260
__pfx_smpboot_thread_fn+0x10/0x10
kthread+0x103/0x140
__pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
__pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
<TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4d6178 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
diff f221033f Wed Mar 13 15:40:31 MDT 2024 Fenghua Yu <fenghua.yu@intel.com> dmaengine: idxd: Fix oops during rmmod on single-CPU platforms

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

BUG: unable to handle page fault for address: 000000000002a2b8
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 1470e1067 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ #57
Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
RIP: 0010:mutex_lock+0x2e/0x50
...
Call Trace:
<TASK>
__die+0x24/0x70
page_fault_oops+0x82/0x160
do_user_addr_fault+0x65/0x6b0
__pfx___rdmsr_safe_on_cpu+0x10/0x10
exc_page_fault+0x7d/0x170
asm_exc_page_fault+0x26/0x30
mutex_lock+0x2e/0x50
mutex_lock+0x1e/0x50
perf_pmu_migrate_context+0x87/0x1f0
perf_event_cpu_offline+0x76/0x90 [idxd]
cpuhp_invoke_callback+0xa2/0x4f0
__pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
cpuhp_thread_fun+0x98/0x150
smpboot_thread_fn+0x27/0x260
smpboot_thread_fn+0x1af/0x260
__pfx_smpboot_thread_fn+0x10/0x10
kthread+0x103/0x140
__pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
__pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
<TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4d6178 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
diff f221033f Wed Mar 13 15:40:31 MDT 2024 Fenghua Yu <fenghua.yu@intel.com> dmaengine: idxd: Fix oops during rmmod on single-CPU platforms

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

BUG: unable to handle page fault for address: 000000000002a2b8
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 1470e1067 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ #57
Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
RIP: 0010:mutex_lock+0x2e/0x50
...
Call Trace:
<TASK>
__die+0x24/0x70
page_fault_oops+0x82/0x160
do_user_addr_fault+0x65/0x6b0
__pfx___rdmsr_safe_on_cpu+0x10/0x10
exc_page_fault+0x7d/0x170
asm_exc_page_fault+0x26/0x30
mutex_lock+0x2e/0x50
mutex_lock+0x1e/0x50
perf_pmu_migrate_context+0x87/0x1f0
perf_event_cpu_offline+0x76/0x90 [idxd]
cpuhp_invoke_callback+0xa2/0x4f0
__pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
cpuhp_thread_fun+0x98/0x150
smpboot_thread_fn+0x27/0x260
smpboot_thread_fn+0x1af/0x260
__pfx_smpboot_thread_fn+0x10/0x10
kthread+0x103/0x140
__pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
__pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
<TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4d6178 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
diff f221033f Wed Mar 13 15:40:31 MDT 2024 Fenghua Yu <fenghua.yu@intel.com> dmaengine: idxd: Fix oops during rmmod on single-CPU platforms

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

BUG: unable to handle page fault for address: 000000000002a2b8
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 1470e1067 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ #57
Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
RIP: 0010:mutex_lock+0x2e/0x50
...
Call Trace:
<TASK>
__die+0x24/0x70
page_fault_oops+0x82/0x160
do_user_addr_fault+0x65/0x6b0
__pfx___rdmsr_safe_on_cpu+0x10/0x10
exc_page_fault+0x7d/0x170
asm_exc_page_fault+0x26/0x30
mutex_lock+0x2e/0x50
mutex_lock+0x1e/0x50
perf_pmu_migrate_context+0x87/0x1f0
perf_event_cpu_offline+0x76/0x90 [idxd]
cpuhp_invoke_callback+0xa2/0x4f0
__pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
cpuhp_thread_fun+0x98/0x150
smpboot_thread_fn+0x27/0x260
smpboot_thread_fn+0x1af/0x260
__pfx_smpboot_thread_fn+0x10/0x10
kthread+0x103/0x140
__pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
__pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
<TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4d6178 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
diff f221033f Wed Mar 13 15:40:31 MDT 2024 Fenghua Yu <fenghua.yu@intel.com> dmaengine: idxd: Fix oops during rmmod on single-CPU platforms

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

BUG: unable to handle page fault for address: 000000000002a2b8
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 1470e1067 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ #57
Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
RIP: 0010:mutex_lock+0x2e/0x50
...
Call Trace:
<TASK>
__die+0x24/0x70
page_fault_oops+0x82/0x160
do_user_addr_fault+0x65/0x6b0
__pfx___rdmsr_safe_on_cpu+0x10/0x10
exc_page_fault+0x7d/0x170
asm_exc_page_fault+0x26/0x30
mutex_lock+0x2e/0x50
mutex_lock+0x1e/0x50
perf_pmu_migrate_context+0x87/0x1f0
perf_event_cpu_offline+0x76/0x90 [idxd]
cpuhp_invoke_callback+0xa2/0x4f0
__pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
cpuhp_thread_fun+0x98/0x150
smpboot_thread_fn+0x27/0x260
smpboot_thread_fn+0x1af/0x260
__pfx_smpboot_thread_fn+0x10/0x10
kthread+0x103/0x140
__pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
__pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
<TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4d6178 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
diff f221033f Wed Mar 13 15:40:31 MDT 2024 Fenghua Yu <fenghua.yu@intel.com> dmaengine: idxd: Fix oops during rmmod on single-CPU platforms

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

BUG: unable to handle page fault for address: 000000000002a2b8
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 1470e1067 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ #57
Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
RIP: 0010:mutex_lock+0x2e/0x50
...
Call Trace:
<TASK>
__die+0x24/0x70
page_fault_oops+0x82/0x160
do_user_addr_fault+0x65/0x6b0
__pfx___rdmsr_safe_on_cpu+0x10/0x10
exc_page_fault+0x7d/0x170
asm_exc_page_fault+0x26/0x30
mutex_lock+0x2e/0x50
mutex_lock+0x1e/0x50
perf_pmu_migrate_context+0x87/0x1f0
perf_event_cpu_offline+0x76/0x90 [idxd]
cpuhp_invoke_callback+0xa2/0x4f0
__pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
cpuhp_thread_fun+0x98/0x150
smpboot_thread_fn+0x27/0x260
smpboot_thread_fn+0x1af/0x260
__pfx_smpboot_thread_fn+0x10/0x10
kthread+0x103/0x140
__pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
__pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
<TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4d6178 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
diff f221033f Wed Mar 13 15:40:31 MDT 2024 Fenghua Yu <fenghua.yu@intel.com> dmaengine: idxd: Fix oops during rmmod on single-CPU platforms

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

BUG: unable to handle page fault for address: 000000000002a2b8
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 1470e1067 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ #57
Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
RIP: 0010:mutex_lock+0x2e/0x50
...
Call Trace:
<TASK>
__die+0x24/0x70
page_fault_oops+0x82/0x160
do_user_addr_fault+0x65/0x6b0
__pfx___rdmsr_safe_on_cpu+0x10/0x10
exc_page_fault+0x7d/0x170
asm_exc_page_fault+0x26/0x30
mutex_lock+0x2e/0x50
mutex_lock+0x1e/0x50
perf_pmu_migrate_context+0x87/0x1f0
perf_event_cpu_offline+0x76/0x90 [idxd]
cpuhp_invoke_callback+0xa2/0x4f0
__pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
cpuhp_thread_fun+0x98/0x150
smpboot_thread_fn+0x27/0x260
smpboot_thread_fn+0x1af/0x260
__pfx_smpboot_thread_fn+0x10/0x10
kthread+0x103/0x140
__pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
__pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
<TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4d6178 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
diff f221033f Wed Mar 13 15:40:31 MDT 2024 Fenghua Yu <fenghua.yu@intel.com> dmaengine: idxd: Fix oops during rmmod on single-CPU platforms

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

BUG: unable to handle page fault for address: 000000000002a2b8
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 1470e1067 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ #57
Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
RIP: 0010:mutex_lock+0x2e/0x50
...
Call Trace:
<TASK>
__die+0x24/0x70
page_fault_oops+0x82/0x160
do_user_addr_fault+0x65/0x6b0
__pfx___rdmsr_safe_on_cpu+0x10/0x10
exc_page_fault+0x7d/0x170
asm_exc_page_fault+0x26/0x30
mutex_lock+0x2e/0x50
mutex_lock+0x1e/0x50
perf_pmu_migrate_context+0x87/0x1f0
perf_event_cpu_offline+0x76/0x90 [idxd]
cpuhp_invoke_callback+0xa2/0x4f0
__pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
cpuhp_thread_fun+0x98/0x150
smpboot_thread_fn+0x27/0x260
smpboot_thread_fn+0x1af/0x260
__pfx_smpboot_thread_fn+0x10/0x10
kthread+0x103/0x140
__pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
__pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
<TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4d6178 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
diff f221033f Wed Mar 13 15:40:31 MDT 2024 Fenghua Yu <fenghua.yu@intel.com> dmaengine: idxd: Fix oops during rmmod on single-CPU platforms

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

BUG: unable to handle page fault for address: 000000000002a2b8
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 1470e1067 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ #57
Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
RIP: 0010:mutex_lock+0x2e/0x50
...
Call Trace:
<TASK>
__die+0x24/0x70
page_fault_oops+0x82/0x160
do_user_addr_fault+0x65/0x6b0
__pfx___rdmsr_safe_on_cpu+0x10/0x10
exc_page_fault+0x7d/0x170
asm_exc_page_fault+0x26/0x30
mutex_lock+0x2e/0x50
mutex_lock+0x1e/0x50
perf_pmu_migrate_context+0x87/0x1f0
perf_event_cpu_offline+0x76/0x90 [idxd]
cpuhp_invoke_callback+0xa2/0x4f0
__pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
cpuhp_thread_fun+0x98/0x150
smpboot_thread_fn+0x27/0x260
smpboot_thread_fn+0x1af/0x260
__pfx_smpboot_thread_fn+0x10/0x10
kthread+0x103/0x140
__pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
__pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
<TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4d6178 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
diff f221033f Wed Mar 13 15:40:31 MDT 2024 Fenghua Yu <fenghua.yu@intel.com> dmaengine: idxd: Fix oops during rmmod on single-CPU platforms

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

BUG: unable to handle page fault for address: 000000000002a2b8
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 1470e1067 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ #57
Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
RIP: 0010:mutex_lock+0x2e/0x50
...
Call Trace:
<TASK>
__die+0x24/0x70
page_fault_oops+0x82/0x160
do_user_addr_fault+0x65/0x6b0
__pfx___rdmsr_safe_on_cpu+0x10/0x10
exc_page_fault+0x7d/0x170
asm_exc_page_fault+0x26/0x30
mutex_lock+0x2e/0x50
mutex_lock+0x1e/0x50
perf_pmu_migrate_context+0x87/0x1f0
perf_event_cpu_offline+0x76/0x90 [idxd]
cpuhp_invoke_callback+0xa2/0x4f0
__pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
cpuhp_thread_fun+0x98/0x150
smpboot_thread_fn+0x27/0x260
smpboot_thread_fn+0x1af/0x260
__pfx_smpboot_thread_fn+0x10/0x10
kthread+0x103/0x140
__pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
__pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
<TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4d6178 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
diff f221033f Wed Mar 13 15:40:31 MDT 2024 Fenghua Yu <fenghua.yu@intel.com> dmaengine: idxd: Fix oops during rmmod on single-CPU platforms

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

BUG: unable to handle page fault for address: 000000000002a2b8
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 1470e1067 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ #57
Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
RIP: 0010:mutex_lock+0x2e/0x50
...
Call Trace:
<TASK>
__die+0x24/0x70
page_fault_oops+0x82/0x160
do_user_addr_fault+0x65/0x6b0
__pfx___rdmsr_safe_on_cpu+0x10/0x10
exc_page_fault+0x7d/0x170
asm_exc_page_fault+0x26/0x30
mutex_lock+0x2e/0x50
mutex_lock+0x1e/0x50
perf_pmu_migrate_context+0x87/0x1f0
perf_event_cpu_offline+0x76/0x90 [idxd]
cpuhp_invoke_callback+0xa2/0x4f0
__pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
cpuhp_thread_fun+0x98/0x150
smpboot_thread_fn+0x27/0x260
smpboot_thread_fn+0x1af/0x260
__pfx_smpboot_thread_fn+0x10/0x10
kthread+0x103/0x140
__pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
__pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
<TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4d6178 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
diff f221033f Wed Mar 13 15:40:31 MDT 2024 Fenghua Yu <fenghua.yu@intel.com> dmaengine: idxd: Fix oops during rmmod on single-CPU platforms

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

BUG: unable to handle page fault for address: 000000000002a2b8
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 1470e1067 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ #57
Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
RIP: 0010:mutex_lock+0x2e/0x50
...
Call Trace:
<TASK>
__die+0x24/0x70
page_fault_oops+0x82/0x160
do_user_addr_fault+0x65/0x6b0
__pfx___rdmsr_safe_on_cpu+0x10/0x10
exc_page_fault+0x7d/0x170
asm_exc_page_fault+0x26/0x30
mutex_lock+0x2e/0x50
mutex_lock+0x1e/0x50
perf_pmu_migrate_context+0x87/0x1f0
perf_event_cpu_offline+0x76/0x90 [idxd]
cpuhp_invoke_callback+0xa2/0x4f0
__pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
cpuhp_thread_fun+0x98/0x150
smpboot_thread_fn+0x27/0x260
smpboot_thread_fn+0x1af/0x260
__pfx_smpboot_thread_fn+0x10/0x10
kthread+0x103/0x140
__pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
__pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
<TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4d6178 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
diff f221033f Wed Mar 13 15:40:31 MDT 2024 Fenghua Yu <fenghua.yu@intel.com> dmaengine: idxd: Fix oops during rmmod on single-CPU platforms

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

BUG: unable to handle page fault for address: 000000000002a2b8
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 1470e1067 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ #57
Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
RIP: 0010:mutex_lock+0x2e/0x50
...
Call Trace:
<TASK>
__die+0x24/0x70
page_fault_oops+0x82/0x160
do_user_addr_fault+0x65/0x6b0
__pfx___rdmsr_safe_on_cpu+0x10/0x10
exc_page_fault+0x7d/0x170
asm_exc_page_fault+0x26/0x30
mutex_lock+0x2e/0x50
mutex_lock+0x1e/0x50
perf_pmu_migrate_context+0x87/0x1f0
perf_event_cpu_offline+0x76/0x90 [idxd]
cpuhp_invoke_callback+0xa2/0x4f0
__pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
cpuhp_thread_fun+0x98/0x150
smpboot_thread_fn+0x27/0x260
smpboot_thread_fn+0x1af/0x260
__pfx_smpboot_thread_fn+0x10/0x10
kthread+0x103/0x140
__pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
__pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
<TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4d6178 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
diff f221033f Wed Mar 13 15:40:31 MDT 2024 Fenghua Yu <fenghua.yu@intel.com> dmaengine: idxd: Fix oops during rmmod on single-CPU platforms

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

BUG: unable to handle page fault for address: 000000000002a2b8
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 1470e1067 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ #57
Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
RIP: 0010:mutex_lock+0x2e/0x50
...
Call Trace:
<TASK>
__die+0x24/0x70
page_fault_oops+0x82/0x160
do_user_addr_fault+0x65/0x6b0
__pfx___rdmsr_safe_on_cpu+0x10/0x10
exc_page_fault+0x7d/0x170
asm_exc_page_fault+0x26/0x30
mutex_lock+0x2e/0x50
mutex_lock+0x1e/0x50
perf_pmu_migrate_context+0x87/0x1f0
perf_event_cpu_offline+0x76/0x90 [idxd]
cpuhp_invoke_callback+0xa2/0x4f0
__pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
cpuhp_thread_fun+0x98/0x150
smpboot_thread_fn+0x27/0x260
smpboot_thread_fn+0x1af/0x260
__pfx_smpboot_thread_fn+0x10/0x10
kthread+0x103/0x140
__pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
__pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
<TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4d6178 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
diff f221033f Wed Mar 13 15:40:31 MDT 2024 Fenghua Yu <fenghua.yu@intel.com> dmaengine: idxd: Fix oops during rmmod on single-CPU platforms

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

BUG: unable to handle page fault for address: 000000000002a2b8
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 1470e1067 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ #57
Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
RIP: 0010:mutex_lock+0x2e/0x50
...
Call Trace:
<TASK>
__die+0x24/0x70
page_fault_oops+0x82/0x160
do_user_addr_fault+0x65/0x6b0
__pfx___rdmsr_safe_on_cpu+0x10/0x10
exc_page_fault+0x7d/0x170
asm_exc_page_fault+0x26/0x30
mutex_lock+0x2e/0x50
mutex_lock+0x1e/0x50
perf_pmu_migrate_context+0x87/0x1f0
perf_event_cpu_offline+0x76/0x90 [idxd]
cpuhp_invoke_callback+0xa2/0x4f0
__pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
cpuhp_thread_fun+0x98/0x150
smpboot_thread_fn+0x27/0x260
smpboot_thread_fn+0x1af/0x260
__pfx_smpboot_thread_fn+0x10/0x10
kthread+0x103/0x140
__pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
__pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
<TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4d6178 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
diff f221033f Wed Mar 13 15:40:31 MDT 2024 Fenghua Yu <fenghua.yu@intel.com> dmaengine: idxd: Fix oops during rmmod on single-CPU platforms

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

BUG: unable to handle page fault for address: 000000000002a2b8
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 1470e1067 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ #57
Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
RIP: 0010:mutex_lock+0x2e/0x50
...
Call Trace:
<TASK>
__die+0x24/0x70
page_fault_oops+0x82/0x160
do_user_addr_fault+0x65/0x6b0
__pfx___rdmsr_safe_on_cpu+0x10/0x10
exc_page_fault+0x7d/0x170
asm_exc_page_fault+0x26/0x30
mutex_lock+0x2e/0x50
mutex_lock+0x1e/0x50
perf_pmu_migrate_context+0x87/0x1f0
perf_event_cpu_offline+0x76/0x90 [idxd]
cpuhp_invoke_callback+0xa2/0x4f0
__pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
cpuhp_thread_fun+0x98/0x150
smpboot_thread_fn+0x27/0x260
smpboot_thread_fn+0x1af/0x260
__pfx_smpboot_thread_fn+0x10/0x10
kthread+0x103/0x140
__pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
__pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
<TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4d6178 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
diff f221033f Wed Mar 13 15:40:31 MDT 2024 Fenghua Yu <fenghua.yu@intel.com> dmaengine: idxd: Fix oops during rmmod on single-CPU platforms

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

BUG: unable to handle page fault for address: 000000000002a2b8
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 1470e1067 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ #57
Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
RIP: 0010:mutex_lock+0x2e/0x50
...
Call Trace:
<TASK>
__die+0x24/0x70
page_fault_oops+0x82/0x160
do_user_addr_fault+0x65/0x6b0
__pfx___rdmsr_safe_on_cpu+0x10/0x10
exc_page_fault+0x7d/0x170
asm_exc_page_fault+0x26/0x30
mutex_lock+0x2e/0x50
mutex_lock+0x1e/0x50
perf_pmu_migrate_context+0x87/0x1f0
perf_event_cpu_offline+0x76/0x90 [idxd]
cpuhp_invoke_callback+0xa2/0x4f0
__pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
cpuhp_thread_fun+0x98/0x150
smpboot_thread_fn+0x27/0x260
smpboot_thread_fn+0x1af/0x260
__pfx_smpboot_thread_fn+0x10/0x10
kthread+0x103/0x140
__pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
__pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
<TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4d6178 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
diff f221033f Wed Mar 13 15:40:31 MDT 2024 Fenghua Yu <fenghua.yu@intel.com> dmaengine: idxd: Fix oops during rmmod on single-CPU platforms

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

BUG: unable to handle page fault for address: 000000000002a2b8
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 1470e1067 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ #57
Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
RIP: 0010:mutex_lock+0x2e/0x50
...
Call Trace:
<TASK>
__die+0x24/0x70
page_fault_oops+0x82/0x160
do_user_addr_fault+0x65/0x6b0
__pfx___rdmsr_safe_on_cpu+0x10/0x10
exc_page_fault+0x7d/0x170
asm_exc_page_fault+0x26/0x30
mutex_lock+0x2e/0x50
mutex_lock+0x1e/0x50
perf_pmu_migrate_context+0x87/0x1f0
perf_event_cpu_offline+0x76/0x90 [idxd]
cpuhp_invoke_callback+0xa2/0x4f0
__pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
cpuhp_thread_fun+0x98/0x150
smpboot_thread_fn+0x27/0x260
smpboot_thread_fn+0x1af/0x260
__pfx_smpboot_thread_fn+0x10/0x10
kthread+0x103/0x140
__pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
__pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
<TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4d6178 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
diff f221033f Wed Mar 13 15:40:31 MDT 2024 Fenghua Yu <fenghua.yu@intel.com> dmaengine: idxd: Fix oops during rmmod on single-CPU platforms

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

BUG: unable to handle page fault for address: 000000000002a2b8
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 1470e1067 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ #57
Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
RIP: 0010:mutex_lock+0x2e/0x50
...
Call Trace:
<TASK>
__die+0x24/0x70
page_fault_oops+0x82/0x160
do_user_addr_fault+0x65/0x6b0
__pfx___rdmsr_safe_on_cpu+0x10/0x10
exc_page_fault+0x7d/0x170
asm_exc_page_fault+0x26/0x30
mutex_lock+0x2e/0x50
mutex_lock+0x1e/0x50
perf_pmu_migrate_context+0x87/0x1f0
perf_event_cpu_offline+0x76/0x90 [idxd]
cpuhp_invoke_callback+0xa2/0x4f0
__pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
cpuhp_thread_fun+0x98/0x150
smpboot_thread_fn+0x27/0x260
smpboot_thread_fn+0x1af/0x260
__pfx_smpboot_thread_fn+0x10/0x10
kthread+0x103/0x140
__pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
__pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
<TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4d6178 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
diff f221033f Wed Mar 13 15:40:31 MDT 2024 Fenghua Yu <fenghua.yu@intel.com> dmaengine: idxd: Fix oops during rmmod on single-CPU platforms

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

BUG: unable to handle page fault for address: 000000000002a2b8
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 1470e1067 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ #57
Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
RIP: 0010:mutex_lock+0x2e/0x50
...
Call Trace:
<TASK>
__die+0x24/0x70
page_fault_oops+0x82/0x160
do_user_addr_fault+0x65/0x6b0
__pfx___rdmsr_safe_on_cpu+0x10/0x10
exc_page_fault+0x7d/0x170
asm_exc_page_fault+0x26/0x30
mutex_lock+0x2e/0x50
mutex_lock+0x1e/0x50
perf_pmu_migrate_context+0x87/0x1f0
perf_event_cpu_offline+0x76/0x90 [idxd]
cpuhp_invoke_callback+0xa2/0x4f0
__pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
cpuhp_thread_fun+0x98/0x150
smpboot_thread_fn+0x27/0x260
smpboot_thread_fn+0x1af/0x260
__pfx_smpboot_thread_fn+0x10/0x10
kthread+0x103/0x140
__pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
__pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
<TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4d6178 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
diff f221033f Wed Mar 13 15:40:31 MDT 2024 Fenghua Yu <fenghua.yu@intel.com> dmaengine: idxd: Fix oops during rmmod on single-CPU platforms

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

BUG: unable to handle page fault for address: 000000000002a2b8
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 1470e1067 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ #57
Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
RIP: 0010:mutex_lock+0x2e/0x50
...
Call Trace:
<TASK>
__die+0x24/0x70
page_fault_oops+0x82/0x160
do_user_addr_fault+0x65/0x6b0
__pfx___rdmsr_safe_on_cpu+0x10/0x10
exc_page_fault+0x7d/0x170
asm_exc_page_fault+0x26/0x30
mutex_lock+0x2e/0x50
mutex_lock+0x1e/0x50
perf_pmu_migrate_context+0x87/0x1f0
perf_event_cpu_offline+0x76/0x90 [idxd]
cpuhp_invoke_callback+0xa2/0x4f0
__pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
cpuhp_thread_fun+0x98/0x150
smpboot_thread_fn+0x27/0x260
smpboot_thread_fn+0x1af/0x260
__pfx_smpboot_thread_fn+0x10/0x10
kthread+0x103/0x140
__pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
__pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
<TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4d6178 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
diff f221033f Wed Mar 13 15:40:31 MDT 2024 Fenghua Yu <fenghua.yu@intel.com> dmaengine: idxd: Fix oops during rmmod on single-CPU platforms

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

BUG: unable to handle page fault for address: 000000000002a2b8
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 1470e1067 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ #57
Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
RIP: 0010:mutex_lock+0x2e/0x50
...
Call Trace:
<TASK>
__die+0x24/0x70
page_fault_oops+0x82/0x160
do_user_addr_fault+0x65/0x6b0
__pfx___rdmsr_safe_on_cpu+0x10/0x10
exc_page_fault+0x7d/0x170
asm_exc_page_fault+0x26/0x30
mutex_lock+0x2e/0x50
mutex_lock+0x1e/0x50
perf_pmu_migrate_context+0x87/0x1f0
perf_event_cpu_offline+0x76/0x90 [idxd]
cpuhp_invoke_callback+0xa2/0x4f0
__pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
cpuhp_thread_fun+0x98/0x150
smpboot_thread_fn+0x27/0x260
smpboot_thread_fn+0x1af/0x260
__pfx_smpboot_thread_fn+0x10/0x10
kthread+0x103/0x140
__pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
__pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
<TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4d6178 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
diff f221033f Wed Mar 13 15:40:31 MDT 2024 Fenghua Yu <fenghua.yu@intel.com> dmaengine: idxd: Fix oops during rmmod on single-CPU platforms

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

BUG: unable to handle page fault for address: 000000000002a2b8
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 1470e1067 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ #57
Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
RIP: 0010:mutex_lock+0x2e/0x50
...
Call Trace:
<TASK>
__die+0x24/0x70
page_fault_oops+0x82/0x160
do_user_addr_fault+0x65/0x6b0
__pfx___rdmsr_safe_on_cpu+0x10/0x10
exc_page_fault+0x7d/0x170
asm_exc_page_fault+0x26/0x30
mutex_lock+0x2e/0x50
mutex_lock+0x1e/0x50
perf_pmu_migrate_context+0x87/0x1f0
perf_event_cpu_offline+0x76/0x90 [idxd]
cpuhp_invoke_callback+0xa2/0x4f0
__pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
cpuhp_thread_fun+0x98/0x150
smpboot_thread_fn+0x27/0x260
smpboot_thread_fn+0x1af/0x260
__pfx_smpboot_thread_fn+0x10/0x10
kthread+0x103/0x140
__pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
__pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
<TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4d6178 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
diff f221033f Wed Mar 13 15:40:31 MDT 2024 Fenghua Yu <fenghua.yu@intel.com> dmaengine: idxd: Fix oops during rmmod on single-CPU platforms

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

BUG: unable to handle page fault for address: 000000000002a2b8
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 1470e1067 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ #57
Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
RIP: 0010:mutex_lock+0x2e/0x50
...
Call Trace:
<TASK>
__die+0x24/0x70
page_fault_oops+0x82/0x160
do_user_addr_fault+0x65/0x6b0
__pfx___rdmsr_safe_on_cpu+0x10/0x10
exc_page_fault+0x7d/0x170
asm_exc_page_fault+0x26/0x30
mutex_lock+0x2e/0x50
mutex_lock+0x1e/0x50
perf_pmu_migrate_context+0x87/0x1f0
perf_event_cpu_offline+0x76/0x90 [idxd]
cpuhp_invoke_callback+0xa2/0x4f0
__pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
cpuhp_thread_fun+0x98/0x150
smpboot_thread_fn+0x27/0x260
smpboot_thread_fn+0x1af/0x260
__pfx_smpboot_thread_fn+0x10/0x10
kthread+0x103/0x140
__pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
__pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
<TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4d6178 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
diff f221033f Wed Mar 13 15:40:31 MDT 2024 Fenghua Yu <fenghua.yu@intel.com> dmaengine: idxd: Fix oops during rmmod on single-CPU platforms

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

BUG: unable to handle page fault for address: 000000000002a2b8
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 1470e1067 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ #57
Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
RIP: 0010:mutex_lock+0x2e/0x50
...
Call Trace:
<TASK>
__die+0x24/0x70
page_fault_oops+0x82/0x160
do_user_addr_fault+0x65/0x6b0
__pfx___rdmsr_safe_on_cpu+0x10/0x10
exc_page_fault+0x7d/0x170
asm_exc_page_fault+0x26/0x30
mutex_lock+0x2e/0x50
mutex_lock+0x1e/0x50
perf_pmu_migrate_context+0x87/0x1f0
perf_event_cpu_offline+0x76/0x90 [idxd]
cpuhp_invoke_callback+0xa2/0x4f0
__pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
cpuhp_thread_fun+0x98/0x150
smpboot_thread_fn+0x27/0x260
smpboot_thread_fn+0x1af/0x260
__pfx_smpboot_thread_fn+0x10/0x10
kthread+0x103/0x140
__pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
__pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
<TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4d6178 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
diff f221033f Wed Mar 13 15:40:31 MDT 2024 Fenghua Yu <fenghua.yu@intel.com> dmaengine: idxd: Fix oops during rmmod on single-CPU platforms

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

BUG: unable to handle page fault for address: 000000000002a2b8
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 1470e1067 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ #57
Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
RIP: 0010:mutex_lock+0x2e/0x50
...
Call Trace:
<TASK>
__die+0x24/0x70
page_fault_oops+0x82/0x160
do_user_addr_fault+0x65/0x6b0
__pfx___rdmsr_safe_on_cpu+0x10/0x10
exc_page_fault+0x7d/0x170
asm_exc_page_fault+0x26/0x30
mutex_lock+0x2e/0x50
mutex_lock+0x1e/0x50
perf_pmu_migrate_context+0x87/0x1f0
perf_event_cpu_offline+0x76/0x90 [idxd]
cpuhp_invoke_callback+0xa2/0x4f0
__pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
cpuhp_thread_fun+0x98/0x150
smpboot_thread_fn+0x27/0x260
smpboot_thread_fn+0x1af/0x260
__pfx_smpboot_thread_fn+0x10/0x10
kthread+0x103/0x140
__pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
__pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
<TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4d6178 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
diff f221033f Wed Mar 13 15:40:31 MDT 2024 Fenghua Yu <fenghua.yu@intel.com> dmaengine: idxd: Fix oops during rmmod on single-CPU platforms

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

BUG: unable to handle page fault for address: 000000000002a2b8
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 1470e1067 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ #57
Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
RIP: 0010:mutex_lock+0x2e/0x50
...
Call Trace:
<TASK>
__die+0x24/0x70
page_fault_oops+0x82/0x160
do_user_addr_fault+0x65/0x6b0
__pfx___rdmsr_safe_on_cpu+0x10/0x10
exc_page_fault+0x7d/0x170
asm_exc_page_fault+0x26/0x30
mutex_lock+0x2e/0x50
mutex_lock+0x1e/0x50
perf_pmu_migrate_context+0x87/0x1f0
perf_event_cpu_offline+0x76/0x90 [idxd]
cpuhp_invoke_callback+0xa2/0x4f0
__pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
cpuhp_thread_fun+0x98/0x150
smpboot_thread_fn+0x27/0x260
smpboot_thread_fn+0x1af/0x260
__pfx_smpboot_thread_fn+0x10/0x10
kthread+0x103/0x140
__pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
__pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
<TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4d6178 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
diff f221033f Wed Mar 13 15:40:31 MDT 2024 Fenghua Yu <fenghua.yu@intel.com> dmaengine: idxd: Fix oops during rmmod on single-CPU platforms

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

BUG: unable to handle page fault for address: 000000000002a2b8
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 1470e1067 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ #57
Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
RIP: 0010:mutex_lock+0x2e/0x50
...
Call Trace:
<TASK>
__die+0x24/0x70
page_fault_oops+0x82/0x160
do_user_addr_fault+0x65/0x6b0
__pfx___rdmsr_safe_on_cpu+0x10/0x10
exc_page_fault+0x7d/0x170
asm_exc_page_fault+0x26/0x30
mutex_lock+0x2e/0x50
mutex_lock+0x1e/0x50
perf_pmu_migrate_context+0x87/0x1f0
perf_event_cpu_offline+0x76/0x90 [idxd]
cpuhp_invoke_callback+0xa2/0x4f0
__pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
cpuhp_thread_fun+0x98/0x150
smpboot_thread_fn+0x27/0x260
smpboot_thread_fn+0x1af/0x260
__pfx_smpboot_thread_fn+0x10/0x10
kthread+0x103/0x140
__pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
__pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
<TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4d6178 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
diff f221033f Wed Mar 13 15:40:31 MDT 2024 Fenghua Yu <fenghua.yu@intel.com> dmaengine: idxd: Fix oops during rmmod on single-CPU platforms

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

BUG: unable to handle page fault for address: 000000000002a2b8
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 1470e1067 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ #57
Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
RIP: 0010:mutex_lock+0x2e/0x50
...
Call Trace:
<TASK>
__die+0x24/0x70
page_fault_oops+0x82/0x160
do_user_addr_fault+0x65/0x6b0
__pfx___rdmsr_safe_on_cpu+0x10/0x10
exc_page_fault+0x7d/0x170
asm_exc_page_fault+0x26/0x30
mutex_lock+0x2e/0x50
mutex_lock+0x1e/0x50
perf_pmu_migrate_context+0x87/0x1f0
perf_event_cpu_offline+0x76/0x90 [idxd]
cpuhp_invoke_callback+0xa2/0x4f0
__pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
cpuhp_thread_fun+0x98/0x150
smpboot_thread_fn+0x27/0x260
smpboot_thread_fn+0x1af/0x260
__pfx_smpboot_thread_fn+0x10/0x10
kthread+0x103/0x140
__pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
__pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
<TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4d6178 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
diff f221033f Wed Mar 13 15:40:31 MDT 2024 Fenghua Yu <fenghua.yu@intel.com> dmaengine: idxd: Fix oops during rmmod on single-CPU platforms

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

BUG: unable to handle page fault for address: 000000000002a2b8
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 1470e1067 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ #57
Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
RIP: 0010:mutex_lock+0x2e/0x50
...
Call Trace:
<TASK>
__die+0x24/0x70
page_fault_oops+0x82/0x160
do_user_addr_fault+0x65/0x6b0
__pfx___rdmsr_safe_on_cpu+0x10/0x10
exc_page_fault+0x7d/0x170
asm_exc_page_fault+0x26/0x30
mutex_lock+0x2e/0x50
mutex_lock+0x1e/0x50
perf_pmu_migrate_context+0x87/0x1f0
perf_event_cpu_offline+0x76/0x90 [idxd]
cpuhp_invoke_callback+0xa2/0x4f0
__pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
cpuhp_thread_fun+0x98/0x150
smpboot_thread_fn+0x27/0x260
smpboot_thread_fn+0x1af/0x260
__pfx_smpboot_thread_fn+0x10/0x10
kthread+0x103/0x140
__pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
__pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
<TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4d6178 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
diff f221033f Wed Mar 13 15:40:31 MDT 2024 Fenghua Yu <fenghua.yu@intel.com> dmaengine: idxd: Fix oops during rmmod on single-CPU platforms

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

BUG: unable to handle page fault for address: 000000000002a2b8
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 1470e1067 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ #57
Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
RIP: 0010:mutex_lock+0x2e/0x50
...
Call Trace:
<TASK>
__die+0x24/0x70
page_fault_oops+0x82/0x160
do_user_addr_fault+0x65/0x6b0
__pfx___rdmsr_safe_on_cpu+0x10/0x10
exc_page_fault+0x7d/0x170
asm_exc_page_fault+0x26/0x30
mutex_lock+0x2e/0x50
mutex_lock+0x1e/0x50
perf_pmu_migrate_context+0x87/0x1f0
perf_event_cpu_offline+0x76/0x90 [idxd]
cpuhp_invoke_callback+0xa2/0x4f0
__pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
cpuhp_thread_fun+0x98/0x150
smpboot_thread_fn+0x27/0x260
smpboot_thread_fn+0x1af/0x260
__pfx_smpboot_thread_fn+0x10/0x10
kthread+0x103/0x140
__pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
__pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
<TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4d6178 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
diff f221033f Wed Mar 13 15:40:31 MDT 2024 Fenghua Yu <fenghua.yu@intel.com> dmaengine: idxd: Fix oops during rmmod on single-CPU platforms

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

BUG: unable to handle page fault for address: 000000000002a2b8
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 1470e1067 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ #57
Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
RIP: 0010:mutex_lock+0x2e/0x50
...
Call Trace:
<TASK>
__die+0x24/0x70
page_fault_oops+0x82/0x160
do_user_addr_fault+0x65/0x6b0
__pfx___rdmsr_safe_on_cpu+0x10/0x10
exc_page_fault+0x7d/0x170
asm_exc_page_fault+0x26/0x30
mutex_lock+0x2e/0x50
mutex_lock+0x1e/0x50
perf_pmu_migrate_context+0x87/0x1f0
perf_event_cpu_offline+0x76/0x90 [idxd]
cpuhp_invoke_callback+0xa2/0x4f0
__pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
cpuhp_thread_fun+0x98/0x150
smpboot_thread_fn+0x27/0x260
smpboot_thread_fn+0x1af/0x260
__pfx_smpboot_thread_fn+0x10/0x10
kthread+0x103/0x140
__pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
__pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
<TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4d6178 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
diff f221033f Wed Mar 13 15:40:31 MDT 2024 Fenghua Yu <fenghua.yu@intel.com> dmaengine: idxd: Fix oops during rmmod on single-CPU platforms

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

BUG: unable to handle page fault for address: 000000000002a2b8
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 1470e1067 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ #57
Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
RIP: 0010:mutex_lock+0x2e/0x50
...
Call Trace:
<TASK>
__die+0x24/0x70
page_fault_oops+0x82/0x160
do_user_addr_fault+0x65/0x6b0
__pfx___rdmsr_safe_on_cpu+0x10/0x10
exc_page_fault+0x7d/0x170
asm_exc_page_fault+0x26/0x30
mutex_lock+0x2e/0x50
mutex_lock+0x1e/0x50
perf_pmu_migrate_context+0x87/0x1f0
perf_event_cpu_offline+0x76/0x90 [idxd]
cpuhp_invoke_callback+0xa2/0x4f0
__pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
cpuhp_thread_fun+0x98/0x150
smpboot_thread_fn+0x27/0x260
smpboot_thread_fn+0x1af/0x260
__pfx_smpboot_thread_fn+0x10/0x10
kthread+0x103/0x140
__pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
__pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
<TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4d6178 ("dmaengine: idxd: Add IDXD performance monitor support")
Reported-by: Terrence Xu <terrence.xu@intel.com>
Tested-by: Terrence Xu <terrence.xu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
/linux-master/net/netfilter/
H A Dnft_byteorder.cdiff c301f098 Fri Nov 03 00:42:51 MDT 2023 Dan Carpenter <dan.carpenter@linaro.org> netfilter: nf_tables: fix pointer math issue in nft_byteorder_eval()

The problem is in nft_byteorder_eval() where we are iterating through a
loop and writing to dst[0], dst[1], dst[2] and so on... On each
iteration we are writing 8 bytes. But dst[] is an array of u32 so each
element only has space for 4 bytes. That means that every iteration
overwrites part of the previous element.

I spotted this bug while reviewing commit caf3ef7468f7 ("netfilter:
nf_tables: prevent OOB access in nft_byteorder_eval") which is a related
issue. I think that the reason we have not detected this bug in testing
is that most of time we only write one element.

Fixes: ce1e7989d989 ("netfilter: nft_byteorder: provide 64bit le/be conversion")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
diff caf3ef74 Wed Jul 05 15:05:35 MDT 2023 Thadeu Lima de Souza Cascardo <cascardo@canonical.com> netfilter: nf_tables: prevent OOB access in nft_byteorder_eval

When evaluating byteorder expressions with size 2, a union with 32-bit and
16-bit members is used. Since the 16-bit members are aligned to 32-bit,
the array accesses will be out-of-bounds.

It may lead to a stack-out-of-bounds access like the one below:

[ 23.095215] ==================================================================
[ 23.095625] BUG: KASAN: stack-out-of-bounds in nft_byteorder_eval+0x13c/0x320
[ 23.096020] Read of size 2 at addr ffffc90000007948 by task ping/115
[ 23.096358]
[ 23.096456] CPU: 0 PID: 115 Comm: ping Not tainted 6.4.0+ #413
[ 23.096770] Call Trace:
[ 23.096910] <IRQ>
[ 23.097030] dump_stack_lvl+0x60/0xc0
[ 23.097218] print_report+0xcf/0x630
[ 23.097388] ? nft_byteorder_eval+0x13c/0x320
[ 23.097577] ? kasan_addr_to_slab+0xd/0xc0
[ 23.097760] ? nft_byteorder_eval+0x13c/0x320
[ 23.097949] kasan_report+0xc9/0x110
[ 23.098106] ? nft_byteorder_eval+0x13c/0x320
[ 23.098298] __asan_load2+0x83/0xd0
[ 23.098453] nft_byteorder_eval+0x13c/0x320
[ 23.098659] nft_do_chain+0x1c8/0xc50
[ 23.098852] ? __pfx_nft_do_chain+0x10/0x10
[ 23.099078] ? __kasan_check_read+0x11/0x20
[ 23.099295] ? __pfx___lock_acquire+0x10/0x10
[ 23.099535] ? __pfx___lock_acquire+0x10/0x10
[ 23.099745] ? __kasan_check_read+0x11/0x20
[ 23.099929] nft_do_chain_ipv4+0xfe/0x140
[ 23.100105] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.100327] ? lock_release+0x204/0x400
[ 23.100515] ? nf_hook.constprop.0+0x340/0x550
[ 23.100779] nf_hook_slow+0x6c/0x100
[ 23.100977] ? __pfx_nft_do_chain_ipv4+0x10/0x10
[ 23.101223] nf_hook.constprop.0+0x334/0x550
[ 23.101443] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.101677] ? __pfx_nf_hook.constprop.0+0x10/0x10
[ 23.101882] ? __pfx_ip_rcv_finish+0x10/0x10
[ 23.102071] ? __pfx_ip_local_deliver_finish+0x10/0x10
[ 23.102291] ? rcu_read_lock_held+0x4b/0x70
[ 23.102481] ip_local_deliver+0xbb/0x110
[ 23.102665] ? __pfx_ip_rcv+0x10/0x10
[ 23.102839] ip_rcv+0x199/0x2a0
[ 23.102980] ? __pfx_ip_rcv+0x10/0x10
[ 23.103140] __netif_receive_skb_one_core+0x13e/0x150
[ 23.103362] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 23.103647] ? mark_held_locks+0x48/0xa0
[ 23.103819] ? process_backlog+0x36c/0x380
[ 23.103999] __netif_receive_skb+0x23/0xc0
[ 23.104179] process_backlog+0x91/0x380
[ 23.104350] __napi_poll.constprop.0+0x66/0x360
[ 23.104589] ? net_rx_action+0x1cb/0x610
[ 23.104811] net_rx_action+0x33e/0x610
[ 23.105024] ? _raw_spin_unlock+0x23/0x50
[ 23.105257] ? __pfx_net_rx_action+0x10/0x10
[ 23.105485] ? mark_held_locks+0x48/0xa0
[ 23.105741] __do_softirq+0xfa/0x5ab
[ 23.105956] ? __dev_queue_xmit+0x765/0x1c00
[ 23.106193] do_softirq.part.0+0x49/0xc0
[ 23.106423] </IRQ>
[ 23.106547] <TASK>
[ 23.106670] __local_bh_enable_ip+0xf5/0x120
[ 23.106903] __dev_queue_xmit+0x789/0x1c00
[ 23.107131] ? __pfx___dev_queue_xmit+0x10/0x10
[ 23.107381] ? find_held_lock+0x8e/0xb0
[ 23.107585] ? lock_release+0x204/0x400
[ 23.107798] ? neigh_resolve_output+0x185/0x350
[ 23.108049] ? mark_held_locks+0x48/0xa0
[ 23.108265] ? neigh_resolve_output+0x185/0x350
[ 23.108514] neigh_resolve_output+0x246/0x350
[ 23.108753] ? neigh_resolve_output+0x246/0x350
[ 23.109003] ip_finish_output2+0x3c3/0x10b0
[ 23.109250] ? __pfx_ip_finish_output2+0x10/0x10
[ 23.109510] ? __pfx_nf_hook+0x10/0x10
[ 23.109732] __ip_finish_output+0x217/0x390
[ 23.109978] ip_finish_output+0x2f/0x130
[ 23.110207] ip_output+0xc9/0x170
[ 23.110404] ip_push_pending_frames+0x1a0/0x240
[ 23.110652] raw_sendmsg+0x102e/0x19e0
[ 23.110871] ? __pfx_raw_sendmsg+0x10/0x10
[ 23.111093] ? lock_release+0x204/0x400
[ 23.111304] ? __mod_lruvec_page_state+0x148/0x330
[ 23.111567] ? find_held_lock+0x8e/0xb0
[ 23.111777] ? find_held_lock+0x8e/0xb0
[ 23.111993] ? __rcu_read_unlock+0x7c/0x2f0
[ 23.112225] ? aa_sk_perm+0x18a/0x550
[ 23.112431] ? filemap_map_pages+0x4f1/0x900
[ 23.112665] ? __pfx_aa_sk_perm+0x10/0x10
[ 23.112880] ? find_held_lock+0x8e/0xb0
[ 23.113098] inet_sendmsg+0xa0/0xb0
[ 23.113297] ? inet_sendmsg+0xa0/0xb0
[ 23.113500] ? __pfx_inet_sendmsg+0x10/0x10
[ 23.113727] sock_sendmsg+0xf4/0x100
[ 23.113924] ? move_addr_to_kernel.part.0+0x4f/0xa0
[ 23.114190] __sys_sendto+0x1d4/0x290
[ 23.114391] ? __pfx___sys_sendto+0x10/0x10
[ 23.114621] ? __pfx_mark_lock.part.0+0x10/0x10
[ 23.114869] ? lock_release+0x204/0x400
[ 23.115076] ? find_held_lock+0x8e/0xb0
[ 23.115287] ? rcu_is_watching+0x23/0x60
[ 23.115503] ? __rseq_handle_notify_resume+0x6e2/0x860
[ 23.115778] ? __kasan_check_write+0x14/0x30
[ 23.116008] ? blkcg_maybe_throttle_current+0x8d/0x770
[ 23.116285] ? mark_held_locks+0x28/0xa0
[ 23.116503] ? do_syscall_64+0x37/0x90
[ 23.116713] __x64_sys_sendto+0x7f/0xb0
[ 23.116924] do_syscall_64+0x59/0x90
[ 23.117123] ? irqentry_exit_to_user_mode+0x25/0x30
[ 23.117387] ? irqentry_exit+0x77/0xb0
[ 23.117593] ? exc_page_fault+0x92/0x140
[ 23.117806] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 23.118081] RIP: 0033:0x7f744aee2bba
[ 23.118282] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 23.119237] RSP: 002b:00007ffd04a7c9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 23.119644] RAX: ffffffffffffffda RBX: 00007ffd04a7e0a0 RCX: 00007f744aee2bba
[ 23.120023] RDX: 0000000000000040 RSI: 000056488e9e6300 RDI: 0000000000000003
[ 23.120413] RBP: 000056488e9e6300 R08: 00007ffd04a80320 R09: 0000000000000010
[ 23.120809] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 23.121219] R13: 00007ffd04a7dc38 R14: 00007ffd04a7ca00 R15: 00007ffd04a7e0a0
[ 23.121617] </TASK>
[ 23.121749]
[ 23.121845] The buggy address belongs to the virtual mapping at
[ 23.121845] [ffffc90000000000, ffffc90000009000) created by:
[ 23.121845] irq_init_percpu_irqstack+0x1cf/0x270
[ 23.122707]
[ 23.122803] The buggy address belongs to the physical page:
[ 23.123104] page:0000000072ac19f0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24a09
[ 23.123609] flags: 0xfffffc0001000(reserved|node=0|zone=1|lastcpupid=0x1fffff)
[ 23.123998] page_type: 0xffffffff()
[ 23.124194] raw: 000fffffc0001000 ffffea0000928248 ffffea0000928248 0000000000000000
[ 23.124610] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 23.125023] page dumped because: kasan: bad access detected
[ 23.125326]
[ 23.125421] Memory state around the buggy address:
[ 23.125682] ffffc90000007800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 23.126072] ffffc90000007880: 00 00 00 00 00 f1 f1 f1 f1 f1 f1 00 00 f2 f2 00
[ 23.126455] >ffffc90000007900: 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 00 00 00
[ 23.126840] ^
[ 23.127138] ffffc90000007980: 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3
[ 23.127522] ffffc90000007a00: f3 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 23.127906] ==================================================================
[ 23.128324] Disabling lock debugging due to kernel taint

Using simple s16 pointers for the 16-bit accesses fixes the problem. For
the 32-bit accesses, src and dst can be used directly.

Fixes: 96518518cc41 ("netfilter: add nftables")
Cc: stable@vger.kernel.org
Reported-by: Tanguy DUBROCA (@SidewayRE) from @Synacktiv working with ZDI
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
/linux-master/arch/riscv/mm/
H A Dpageattr.cdiff fb1cf087 Tue May 14 23:50:40 MDT 2024 Nam Cao <namcao@linutronix.de> riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context

__kernel_map_pages() is a debug function which clears the valid bit in page
table entry for deallocated pages to detect illegal memory accesses to
freed pages.

This function set/clear the valid bit using __set_memory(). __set_memory()
acquires init_mm's semaphore, and this operation may sleep. This is
problematic, because __kernel_map_pages() can be called in atomic context,
and thus is illegal to sleep. An example warning that this causes:

BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd
preempt_count: 2, expected: 0
CPU: 0 PID: 2 Comm: kthreadd Not tainted 6.9.0-g1d4c6d784ef6 #37
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff800060dc>] dump_backtrace+0x1c/0x24
[<ffffffff8091ef6e>] show_stack+0x2c/0x38
[<ffffffff8092baf8>] dump_stack_lvl+0x5a/0x72
[<ffffffff8092bb24>] dump_stack+0x14/0x1c
[<ffffffff8003b7ac>] __might_resched+0x104/0x10e
[<ffffffff8003b7f4>] __might_sleep+0x3e/0x62
[<ffffffff8093276a>] down_write+0x20/0x72
[<ffffffff8000cf00>] __set_memory+0x82/0x2fa
[<ffffffff8000d324>] __kernel_map_pages+0x5a/0xd4
[<ffffffff80196cca>] __alloc_pages_bulk+0x3b2/0x43a
[<ffffffff8018ee82>] __vmalloc_node_range+0x196/0x6ba
[<ffffffff80011904>] copy_process+0x72c/0x17ec
[<ffffffff80012ab4>] kernel_clone+0x60/0x2fe
[<ffffffff80012f62>] kernel_thread+0x82/0xa0
[<ffffffff8003552c>] kthreadd+0x14a/0x1be
[<ffffffff809357de>] ret_from_fork+0xe/0x1c

Rewrite this function with apply_to_existing_page_range(). It is fine to
not have any locking, because __kernel_map_pages() works with pages being
allocated/deallocated and those pages are not changed by anyone else in the
meantime.

Fixes: 5fde3db5eb02 ("riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC support")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Cc: stable@vger.kernel.org
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/1289ecba9606a19917bc12b6c27da8aa23e1e5ae.1715750938.git.namcao@linutronix.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
diff fb1cf087 Tue May 14 23:50:40 MDT 2024 Nam Cao <namcao@linutronix.de> riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context

__kernel_map_pages() is a debug function which clears the valid bit in page
table entry for deallocated pages to detect illegal memory accesses to
freed pages.

This function set/clear the valid bit using __set_memory(). __set_memory()
acquires init_mm's semaphore, and this operation may sleep. This is
problematic, because __kernel_map_pages() can be called in atomic context,
and thus is illegal to sleep. An example warning that this causes:

BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd
preempt_count: 2, expected: 0
CPU: 0 PID: 2 Comm: kthreadd Not tainted 6.9.0-g1d4c6d784ef6 #37
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff800060dc>] dump_backtrace+0x1c/0x24
[<ffffffff8091ef6e>] show_stack+0x2c/0x38
[<ffffffff8092baf8>] dump_stack_lvl+0x5a/0x72
[<ffffffff8092bb24>] dump_stack+0x14/0x1c
[<ffffffff8003b7ac>] __might_resched+0x104/0x10e
[<ffffffff8003b7f4>] __might_sleep+0x3e/0x62
[<ffffffff8093276a>] down_write+0x20/0x72
[<ffffffff8000cf00>] __set_memory+0x82/0x2fa
[<ffffffff8000d324>] __kernel_map_pages+0x5a/0xd4
[<ffffffff80196cca>] __alloc_pages_bulk+0x3b2/0x43a
[<ffffffff8018ee82>] __vmalloc_node_range+0x196/0x6ba
[<ffffffff80011904>] copy_process+0x72c/0x17ec
[<ffffffff80012ab4>] kernel_clone+0x60/0x2fe
[<ffffffff80012f62>] kernel_thread+0x82/0xa0
[<ffffffff8003552c>] kthreadd+0x14a/0x1be
[<ffffffff809357de>] ret_from_fork+0xe/0x1c

Rewrite this function with apply_to_existing_page_range(). It is fine to
not have any locking, because __kernel_map_pages() works with pages being
allocated/deallocated and those pages are not changed by anyone else in the
meantime.

Fixes: 5fde3db5eb02 ("riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC support")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Cc: stable@vger.kernel.org
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/1289ecba9606a19917bc12b6c27da8aa23e1e5ae.1715750938.git.namcao@linutronix.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
diff fb1cf087 Tue May 14 23:50:40 MDT 2024 Nam Cao <namcao@linutronix.de> riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context

__kernel_map_pages() is a debug function which clears the valid bit in page
table entry for deallocated pages to detect illegal memory accesses to
freed pages.

This function set/clear the valid bit using __set_memory(). __set_memory()
acquires init_mm's semaphore, and this operation may sleep. This is
problematic, because __kernel_map_pages() can be called in atomic context,
and thus is illegal to sleep. An example warning that this causes:

BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd
preempt_count: 2, expected: 0
CPU: 0 PID: 2 Comm: kthreadd Not tainted 6.9.0-g1d4c6d784ef6 #37
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff800060dc>] dump_backtrace+0x1c/0x24
[<ffffffff8091ef6e>] show_stack+0x2c/0x38
[<ffffffff8092baf8>] dump_stack_lvl+0x5a/0x72
[<ffffffff8092bb24>] dump_stack+0x14/0x1c
[<ffffffff8003b7ac>] __might_resched+0x104/0x10e
[<ffffffff8003b7f4>] __might_sleep+0x3e/0x62
[<ffffffff8093276a>] down_write+0x20/0x72
[<ffffffff8000cf00>] __set_memory+0x82/0x2fa
[<ffffffff8000d324>] __kernel_map_pages+0x5a/0xd4
[<ffffffff80196cca>] __alloc_pages_bulk+0x3b2/0x43a
[<ffffffff8018ee82>] __vmalloc_node_range+0x196/0x6ba
[<ffffffff80011904>] copy_process+0x72c/0x17ec
[<ffffffff80012ab4>] kernel_clone+0x60/0x2fe
[<ffffffff80012f62>] kernel_thread+0x82/0xa0
[<ffffffff8003552c>] kthreadd+0x14a/0x1be
[<ffffffff809357de>] ret_from_fork+0xe/0x1c

Rewrite this function with apply_to_existing_page_range(). It is fine to
not have any locking, because __kernel_map_pages() works with pages being
allocated/deallocated and those pages are not changed by anyone else in the
meantime.

Fixes: 5fde3db5eb02 ("riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC support")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Cc: stable@vger.kernel.org
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/1289ecba9606a19917bc12b6c27da8aa23e1e5ae.1715750938.git.namcao@linutronix.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
diff fb1cf087 Tue May 14 23:50:40 MDT 2024 Nam Cao <namcao@linutronix.de> riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context

__kernel_map_pages() is a debug function which clears the valid bit in page
table entry for deallocated pages to detect illegal memory accesses to
freed pages.

This function set/clear the valid bit using __set_memory(). __set_memory()
acquires init_mm's semaphore, and this operation may sleep. This is
problematic, because __kernel_map_pages() can be called in atomic context,
and thus is illegal to sleep. An example warning that this causes:

BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd
preempt_count: 2, expected: 0
CPU: 0 PID: 2 Comm: kthreadd Not tainted 6.9.0-g1d4c6d784ef6 #37
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff800060dc>] dump_backtrace+0x1c/0x24
[<ffffffff8091ef6e>] show_stack+0x2c/0x38
[<ffffffff8092baf8>] dump_stack_lvl+0x5a/0x72
[<ffffffff8092bb24>] dump_stack+0x14/0x1c
[<ffffffff8003b7ac>] __might_resched+0x104/0x10e
[<ffffffff8003b7f4>] __might_sleep+0x3e/0x62
[<ffffffff8093276a>] down_write+0x20/0x72
[<ffffffff8000cf00>] __set_memory+0x82/0x2fa
[<ffffffff8000d324>] __kernel_map_pages+0x5a/0xd4
[<ffffffff80196cca>] __alloc_pages_bulk+0x3b2/0x43a
[<ffffffff8018ee82>] __vmalloc_node_range+0x196/0x6ba
[<ffffffff80011904>] copy_process+0x72c/0x17ec
[<ffffffff80012ab4>] kernel_clone+0x60/0x2fe
[<ffffffff80012f62>] kernel_thread+0x82/0xa0
[<ffffffff8003552c>] kthreadd+0x14a/0x1be
[<ffffffff809357de>] ret_from_fork+0xe/0x1c

Rewrite this function with apply_to_existing_page_range(). It is fine to
not have any locking, because __kernel_map_pages() works with pages being
allocated/deallocated and those pages are not changed by anyone else in the
meantime.

Fixes: 5fde3db5eb02 ("riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC support")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Cc: stable@vger.kernel.org
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/1289ecba9606a19917bc12b6c27da8aa23e1e5ae.1715750938.git.namcao@linutronix.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
diff fb1cf087 Tue May 14 23:50:40 MDT 2024 Nam Cao <namcao@linutronix.de> riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context

__kernel_map_pages() is a debug function which clears the valid bit in page
table entry for deallocated pages to detect illegal memory accesses to
freed pages.

This function set/clear the valid bit using __set_memory(). __set_memory()
acquires init_mm's semaphore, and this operation may sleep. This is
problematic, because __kernel_map_pages() can be called in atomic context,
and thus is illegal to sleep. An example warning that this causes:

BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd
preempt_count: 2, expected: 0
CPU: 0 PID: 2 Comm: kthreadd Not tainted 6.9.0-g1d4c6d784ef6 #37
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff800060dc>] dump_backtrace+0x1c/0x24
[<ffffffff8091ef6e>] show_stack+0x2c/0x38
[<ffffffff8092baf8>] dump_stack_lvl+0x5a/0x72
[<ffffffff8092bb24>] dump_stack+0x14/0x1c
[<ffffffff8003b7ac>] __might_resched+0x104/0x10e
[<ffffffff8003b7f4>] __might_sleep+0x3e/0x62
[<ffffffff8093276a>] down_write+0x20/0x72
[<ffffffff8000cf00>] __set_memory+0x82/0x2fa
[<ffffffff8000d324>] __kernel_map_pages+0x5a/0xd4
[<ffffffff80196cca>] __alloc_pages_bulk+0x3b2/0x43a
[<ffffffff8018ee82>] __vmalloc_node_range+0x196/0x6ba
[<ffffffff80011904>] copy_process+0x72c/0x17ec
[<ffffffff80012ab4>] kernel_clone+0x60/0x2fe
[<ffffffff80012f62>] kernel_thread+0x82/0xa0
[<ffffffff8003552c>] kthreadd+0x14a/0x1be
[<ffffffff809357de>] ret_from_fork+0xe/0x1c

Rewrite this function with apply_to_existing_page_range(). It is fine to
not have any locking, because __kernel_map_pages() works with pages being
allocated/deallocated and those pages are not changed by anyone else in the
meantime.

Fixes: 5fde3db5eb02 ("riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC support")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Cc: stable@vger.kernel.org
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/1289ecba9606a19917bc12b6c27da8aa23e1e5ae.1715750938.git.namcao@linutronix.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
diff fb1cf087 Tue May 14 23:50:40 MDT 2024 Nam Cao <namcao@linutronix.de> riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context

__kernel_map_pages() is a debug function which clears the valid bit in page
table entry for deallocated pages to detect illegal memory accesses to
freed pages.

This function set/clear the valid bit using __set_memory(). __set_memory()
acquires init_mm's semaphore, and this operation may sleep. This is
problematic, because __kernel_map_pages() can be called in atomic context,
and thus is illegal to sleep. An example warning that this causes:

BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd
preempt_count: 2, expected: 0
CPU: 0 PID: 2 Comm: kthreadd Not tainted 6.9.0-g1d4c6d784ef6 #37
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff800060dc>] dump_backtrace+0x1c/0x24
[<ffffffff8091ef6e>] show_stack+0x2c/0x38
[<ffffffff8092baf8>] dump_stack_lvl+0x5a/0x72
[<ffffffff8092bb24>] dump_stack+0x14/0x1c
[<ffffffff8003b7ac>] __might_resched+0x104/0x10e
[<ffffffff8003b7f4>] __might_sleep+0x3e/0x62
[<ffffffff8093276a>] down_write+0x20/0x72
[<ffffffff8000cf00>] __set_memory+0x82/0x2fa
[<ffffffff8000d324>] __kernel_map_pages+0x5a/0xd4
[<ffffffff80196cca>] __alloc_pages_bulk+0x3b2/0x43a
[<ffffffff8018ee82>] __vmalloc_node_range+0x196/0x6ba
[<ffffffff80011904>] copy_process+0x72c/0x17ec
[<ffffffff80012ab4>] kernel_clone+0x60/0x2fe
[<ffffffff80012f62>] kernel_thread+0x82/0xa0
[<ffffffff8003552c>] kthreadd+0x14a/0x1be
[<ffffffff809357de>] ret_from_fork+0xe/0x1c

Rewrite this function with apply_to_existing_page_range(). It is fine to
not have any locking, because __kernel_map_pages() works with pages being
allocated/deallocated and those pages are not changed by anyone else in the
meantime.

Fixes: 5fde3db5eb02 ("riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC support")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Cc: stable@vger.kernel.org
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/1289ecba9606a19917bc12b6c27da8aa23e1e5ae.1715750938.git.namcao@linutronix.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
diff fb1cf087 Tue May 14 23:50:40 MDT 2024 Nam Cao <namcao@linutronix.de> riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context

__kernel_map_pages() is a debug function which clears the valid bit in page
table entry for deallocated pages to detect illegal memory accesses to
freed pages.

This function set/clear the valid bit using __set_memory(). __set_memory()
acquires init_mm's semaphore, and this operation may sleep. This is
problematic, because __kernel_map_pages() can be called in atomic context,
and thus is illegal to sleep. An example warning that this causes:

BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd
preempt_count: 2, expected: 0
CPU: 0 PID: 2 Comm: kthreadd Not tainted 6.9.0-g1d4c6d784ef6 #37
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff800060dc>] dump_backtrace+0x1c/0x24
[<ffffffff8091ef6e>] show_stack+0x2c/0x38
[<ffffffff8092baf8>] dump_stack_lvl+0x5a/0x72
[<ffffffff8092bb24>] dump_stack+0x14/0x1c
[<ffffffff8003b7ac>] __might_resched+0x104/0x10e
[<ffffffff8003b7f4>] __might_sleep+0x3e/0x62
[<ffffffff8093276a>] down_write+0x20/0x72
[<ffffffff8000cf00>] __set_memory+0x82/0x2fa
[<ffffffff8000d324>] __kernel_map_pages+0x5a/0xd4
[<ffffffff80196cca>] __alloc_pages_bulk+0x3b2/0x43a
[<ffffffff8018ee82>] __vmalloc_node_range+0x196/0x6ba
[<ffffffff80011904>] copy_process+0x72c/0x17ec
[<ffffffff80012ab4>] kernel_clone+0x60/0x2fe
[<ffffffff80012f62>] kernel_thread+0x82/0xa0
[<ffffffff8003552c>] kthreadd+0x14a/0x1be
[<ffffffff809357de>] ret_from_fork+0xe/0x1c

Rewrite this function with apply_to_existing_page_range(). It is fine to
not have any locking, because __kernel_map_pages() works with pages being
allocated/deallocated and those pages are not changed by anyone else in the
meantime.

Fixes: 5fde3db5eb02 ("riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC support")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Cc: stable@vger.kernel.org
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/1289ecba9606a19917bc12b6c27da8aa23e1e5ae.1715750938.git.namcao@linutronix.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
diff fb1cf087 Tue May 14 23:50:40 MDT 2024 Nam Cao <namcao@linutronix.de> riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context

__kernel_map_pages() is a debug function which clears the valid bit in page
table entry for deallocated pages to detect illegal memory accesses to
freed pages.

This function set/clear the valid bit using __set_memory(). __set_memory()
acquires init_mm's semaphore, and this operation may sleep. This is
problematic, because __kernel_map_pages() can be called in atomic context,
and thus is illegal to sleep. An example warning that this causes:

BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd
preempt_count: 2, expected: 0
CPU: 0 PID: 2 Comm: kthreadd Not tainted 6.9.0-g1d4c6d784ef6 #37
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff800060dc>] dump_backtrace+0x1c/0x24
[<ffffffff8091ef6e>] show_stack+0x2c/0x38
[<ffffffff8092baf8>] dump_stack_lvl+0x5a/0x72
[<ffffffff8092bb24>] dump_stack+0x14/0x1c
[<ffffffff8003b7ac>] __might_resched+0x104/0x10e
[<ffffffff8003b7f4>] __might_sleep+0x3e/0x62
[<ffffffff8093276a>] down_write+0x20/0x72
[<ffffffff8000cf00>] __set_memory+0x82/0x2fa
[<ffffffff8000d324>] __kernel_map_pages+0x5a/0xd4
[<ffffffff80196cca>] __alloc_pages_bulk+0x3b2/0x43a
[<ffffffff8018ee82>] __vmalloc_node_range+0x196/0x6ba
[<ffffffff80011904>] copy_process+0x72c/0x17ec
[<ffffffff80012ab4>] kernel_clone+0x60/0x2fe
[<ffffffff80012f62>] kernel_thread+0x82/0xa0
[<ffffffff8003552c>] kthreadd+0x14a/0x1be
[<ffffffff809357de>] ret_from_fork+0xe/0x1c

Rewrite this function with apply_to_existing_page_range(). It is fine to
not have any locking, because __kernel_map_pages() works with pages being
allocated/deallocated and those pages are not changed by anyone else in the
meantime.

Fixes: 5fde3db5eb02 ("riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC support")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Cc: stable@vger.kernel.org
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/1289ecba9606a19917bc12b6c27da8aa23e1e5ae.1715750938.git.namcao@linutronix.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
diff fb1cf087 Tue May 14 23:50:40 MDT 2024 Nam Cao <namcao@linutronix.de> riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context

__kernel_map_pages() is a debug function which clears the valid bit in page
table entry for deallocated pages to detect illegal memory accesses to
freed pages.

This function set/clear the valid bit using __set_memory(). __set_memory()
acquires init_mm's semaphore, and this operation may sleep. This is
problematic, because __kernel_map_pages() can be called in atomic context,
and thus is illegal to sleep. An example warning that this causes:

BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd
preempt_count: 2, expected: 0
CPU: 0 PID: 2 Comm: kthreadd Not tainted 6.9.0-g1d4c6d784ef6 #37
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff800060dc>] dump_backtrace+0x1c/0x24
[<ffffffff8091ef6e>] show_stack+0x2c/0x38
[<ffffffff8092baf8>] dump_stack_lvl+0x5a/0x72
[<ffffffff8092bb24>] dump_stack+0x14/0x1c
[<ffffffff8003b7ac>] __might_resched+0x104/0x10e
[<ffffffff8003b7f4>] __might_sleep+0x3e/0x62
[<ffffffff8093276a>] down_write+0x20/0x72
[<ffffffff8000cf00>] __set_memory+0x82/0x2fa
[<ffffffff8000d324>] __kernel_map_pages+0x5a/0xd4
[<ffffffff80196cca>] __alloc_pages_bulk+0x3b2/0x43a
[<ffffffff8018ee82>] __vmalloc_node_range+0x196/0x6ba
[<ffffffff80011904>] copy_process+0x72c/0x17ec
[<ffffffff80012ab4>] kernel_clone+0x60/0x2fe
[<ffffffff80012f62>] kernel_thread+0x82/0xa0
[<ffffffff8003552c>] kthreadd+0x14a/0x1be
[<ffffffff809357de>] ret_from_fork+0xe/0x1c

Rewrite this function with apply_to_existing_page_range(). It is fine to
not have any locking, because __kernel_map_pages() works with pages being
allocated/deallocated and those pages are not changed by anyone else in the
meantime.

Fixes: 5fde3db5eb02 ("riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC support")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Cc: stable@vger.kernel.org
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/1289ecba9606a19917bc12b6c27da8aa23e1e5ae.1715750938.git.namcao@linutronix.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
diff fb1cf087 Tue May 14 23:50:40 MDT 2024 Nam Cao <namcao@linutronix.de> riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context

__kernel_map_pages() is a debug function which clears the valid bit in page
table entry for deallocated pages to detect illegal memory accesses to
freed pages.

This function set/clear the valid bit using __set_memory(). __set_memory()
acquires init_mm's semaphore, and this operation may sleep. This is
problematic, because __kernel_map_pages() can be called in atomic context,
and thus is illegal to sleep. An example warning that this causes:

BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd
preempt_count: 2, expected: 0
CPU: 0 PID: 2 Comm: kthreadd Not tainted 6.9.0-g1d4c6d784ef6 #37
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff800060dc>] dump_backtrace+0x1c/0x24
[<ffffffff8091ef6e>] show_stack+0x2c/0x38
[<ffffffff8092baf8>] dump_stack_lvl+0x5a/0x72
[<ffffffff8092bb24>] dump_stack+0x14/0x1c
[<ffffffff8003b7ac>] __might_resched+0x104/0x10e
[<ffffffff8003b7f4>] __might_sleep+0x3e/0x62
[<ffffffff8093276a>] down_write+0x20/0x72
[<ffffffff8000cf00>] __set_memory+0x82/0x2fa
[<ffffffff8000d324>] __kernel_map_pages+0x5a/0xd4
[<ffffffff80196cca>] __alloc_pages_bulk+0x3b2/0x43a
[<ffffffff8018ee82>] __vmalloc_node_range+0x196/0x6ba
[<ffffffff80011904>] copy_process+0x72c/0x17ec
[<ffffffff80012ab4>] kernel_clone+0x60/0x2fe
[<ffffffff80012f62>] kernel_thread+0x82/0xa0
[<ffffffff8003552c>] kthreadd+0x14a/0x1be
[<ffffffff809357de>] ret_from_fork+0xe/0x1c

Rewrite this function with apply_to_existing_page_range(). It is fine to
not have any locking, because __kernel_map_pages() works with pages being
allocated/deallocated and those pages are not changed by anyone else in the
meantime.

Fixes: 5fde3db5eb02 ("riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC support")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Cc: stable@vger.kernel.org
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/1289ecba9606a19917bc12b6c27da8aa23e1e5ae.1715750938.git.namcao@linutronix.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
diff fb1cf087 Tue May 14 23:50:40 MDT 2024 Nam Cao <namcao@linutronix.de> riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context

__kernel_map_pages() is a debug function which clears the valid bit in page
table entry for deallocated pages to detect illegal memory accesses to
freed pages.

This function set/clear the valid bit using __set_memory(). __set_memory()
acquires init_mm's semaphore, and this operation may sleep. This is
problematic, because __kernel_map_pages() can be called in atomic context,
and thus is illegal to sleep. An example warning that this causes:

BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd
preempt_count: 2, expected: 0
CPU: 0 PID: 2 Comm: kthreadd Not tainted 6.9.0-g1d4c6d784ef6 #37
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff800060dc>] dump_backtrace+0x1c/0x24
[<ffffffff8091ef6e>] show_stack+0x2c/0x38
[<ffffffff8092baf8>] dump_stack_lvl+0x5a/0x72
[<ffffffff8092bb24>] dump_stack+0x14/0x1c
[<ffffffff8003b7ac>] __might_resched+0x104/0x10e
[<ffffffff8003b7f4>] __might_sleep+0x3e/0x62
[<ffffffff8093276a>] down_write+0x20/0x72
[<ffffffff8000cf00>] __set_memory+0x82/0x2fa
[<ffffffff8000d324>] __kernel_map_pages+0x5a/0xd4
[<ffffffff80196cca>] __alloc_pages_bulk+0x3b2/0x43a
[<ffffffff8018ee82>] __vmalloc_node_range+0x196/0x6ba
[<ffffffff80011904>] copy_process+0x72c/0x17ec
[<ffffffff80012ab4>] kernel_clone+0x60/0x2fe
[<ffffffff80012f62>] kernel_thread+0x82/0xa0
[<ffffffff8003552c>] kthreadd+0x14a/0x1be
[<ffffffff809357de>] ret_from_fork+0xe/0x1c

Rewrite this function with apply_to_existing_page_range(). It is fine to
not have any locking, because __kernel_map_pages() works with pages being
allocated/deallocated and those pages are not changed by anyone else in the
meantime.

Fixes: 5fde3db5eb02 ("riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC support")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Cc: stable@vger.kernel.org
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/1289ecba9606a19917bc12b6c27da8aa23e1e5ae.1715750938.git.namcao@linutronix.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
diff fb1cf087 Tue May 14 23:50:40 MDT 2024 Nam Cao <namcao@linutronix.de> riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context

__kernel_map_pages() is a debug function which clears the valid bit in page
table entry for deallocated pages to detect illegal memory accesses to
freed pages.

This function set/clear the valid bit using __set_memory(). __set_memory()
acquires init_mm's semaphore, and this operation may sleep. This is
problematic, because __kernel_map_pages() can be called in atomic context,
and thus is illegal to sleep. An example warning that this causes:

BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd
preempt_count: 2, expected: 0
CPU: 0 PID: 2 Comm: kthreadd Not tainted 6.9.0-g1d4c6d784ef6 #37
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff800060dc>] dump_backtrace+0x1c/0x24
[<ffffffff8091ef6e>] show_stack+0x2c/0x38
[<ffffffff8092baf8>] dump_stack_lvl+0x5a/0x72
[<ffffffff8092bb24>] dump_stack+0x14/0x1c
[<ffffffff8003b7ac>] __might_resched+0x104/0x10e
[<ffffffff8003b7f4>] __might_sleep+0x3e/0x62
[<ffffffff8093276a>] down_write+0x20/0x72
[<ffffffff8000cf00>] __set_memory+0x82/0x2fa
[<ffffffff8000d324>] __kernel_map_pages+0x5a/0xd4
[<ffffffff80196cca>] __alloc_pages_bulk+0x3b2/0x43a
[<ffffffff8018ee82>] __vmalloc_node_range+0x196/0x6ba
[<ffffffff80011904>] copy_process+0x72c/0x17ec
[<ffffffff80012ab4>] kernel_clone+0x60/0x2fe
[<ffffffff80012f62>] kernel_thread+0x82/0xa0
[<ffffffff8003552c>] kthreadd+0x14a/0x1be
[<ffffffff809357de>] ret_from_fork+0xe/0x1c

Rewrite this function with apply_to_existing_page_range(). It is fine to
not have any locking, because __kernel_map_pages() works with pages being
allocated/deallocated and those pages are not changed by anyone else in the
meantime.

Fixes: 5fde3db5eb02 ("riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC support")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Cc: stable@vger.kernel.org
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/1289ecba9606a19917bc12b6c27da8aa23e1e5ae.1715750938.git.namcao@linutronix.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
diff fb1cf087 Tue May 14 23:50:40 MDT 2024 Nam Cao <namcao@linutronix.de> riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context

__kernel_map_pages() is a debug function which clears the valid bit in page
table entry for deallocated pages to detect illegal memory accesses to
freed pages.

This function set/clear the valid bit using __set_memory(). __set_memory()
acquires init_mm's semaphore, and this operation may sleep. This is
problematic, because __kernel_map_pages() can be called in atomic context,
and thus is illegal to sleep. An example warning that this causes:

BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd
preempt_count: 2, expected: 0
CPU: 0 PID: 2 Comm: kthreadd Not tainted 6.9.0-g1d4c6d784ef6 #37
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff800060dc>] dump_backtrace+0x1c/0x24
[<ffffffff8091ef6e>] show_stack+0x2c/0x38
[<ffffffff8092baf8>] dump_stack_lvl+0x5a/0x72
[<ffffffff8092bb24>] dump_stack+0x14/0x1c
[<ffffffff8003b7ac>] __might_resched+0x104/0x10e
[<ffffffff8003b7f4>] __might_sleep+0x3e/0x62
[<ffffffff8093276a>] down_write+0x20/0x72
[<ffffffff8000cf00>] __set_memory+0x82/0x2fa
[<ffffffff8000d324>] __kernel_map_pages+0x5a/0xd4
[<ffffffff80196cca>] __alloc_pages_bulk+0x3b2/0x43a
[<ffffffff8018ee82>] __vmalloc_node_range+0x196/0x6ba
[<ffffffff80011904>] copy_process+0x72c/0x17ec
[<ffffffff80012ab4>] kernel_clone+0x60/0x2fe
[<ffffffff80012f62>] kernel_thread+0x82/0xa0
[<ffffffff8003552c>] kthreadd+0x14a/0x1be
[<ffffffff809357de>] ret_from_fork+0xe/0x1c

Rewrite this function with apply_to_existing_page_range(). It is fine to
not have any locking, because __kernel_map_pages() works with pages being
allocated/deallocated and those pages are not changed by anyone else in the
meantime.

Fixes: 5fde3db5eb02 ("riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC support")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Cc: stable@vger.kernel.org
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/1289ecba9606a19917bc12b6c27da8aa23e1e5ae.1715750938.git.namcao@linutronix.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
diff fb1cf087 Tue May 14 23:50:40 MDT 2024 Nam Cao <namcao@linutronix.de> riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context

__kernel_map_pages() is a debug function which clears the valid bit in page
table entry for deallocated pages to detect illegal memory accesses to
freed pages.

This function set/clear the valid bit using __set_memory(). __set_memory()
acquires init_mm's semaphore, and this operation may sleep. This is
problematic, because __kernel_map_pages() can be called in atomic context,
and thus is illegal to sleep. An example warning that this causes:

BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd
preempt_count: 2, expected: 0
CPU: 0 PID: 2 Comm: kthreadd Not tainted 6.9.0-g1d4c6d784ef6 #37
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff800060dc>] dump_backtrace+0x1c/0x24
[<ffffffff8091ef6e>] show_stack+0x2c/0x38
[<ffffffff8092baf8>] dump_stack_lvl+0x5a/0x72
[<ffffffff8092bb24>] dump_stack+0x14/0x1c
[<ffffffff8003b7ac>] __might_resched+0x104/0x10e
[<ffffffff8003b7f4>] __might_sleep+0x3e/0x62
[<ffffffff8093276a>] down_write+0x20/0x72
[<ffffffff8000cf00>] __set_memory+0x82/0x2fa
[<ffffffff8000d324>] __kernel_map_pages+0x5a/0xd4
[<ffffffff80196cca>] __alloc_pages_bulk+0x3b2/0x43a
[<ffffffff8018ee82>] __vmalloc_node_range+0x196/0x6ba
[<ffffffff80011904>] copy_process+0x72c/0x17ec
[<ffffffff80012ab4>] kernel_clone+0x60/0x2fe
[<ffffffff80012f62>] kernel_thread+0x82/0xa0
[<ffffffff8003552c>] kthreadd+0x14a/0x1be
[<ffffffff809357de>] ret_from_fork+0xe/0x1c

Rewrite this function with apply_to_existing_page_range(). It is fine to
not have any locking, because __kernel_map_pages() works with pages being
allocated/deallocated and those pages are not changed by anyone else in the
meantime.

Fixes: 5fde3db5eb02 ("riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC support")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Cc: stable@vger.kernel.org
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/1289ecba9606a19917bc12b6c27da8aa23e1e5ae.1715750938.git.namcao@linutronix.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
diff fb1cf087 Tue May 14 23:50:40 MDT 2024 Nam Cao <namcao@linutronix.de> riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context

__kernel_map_pages() is a debug function which clears the valid bit in page
table entry for deallocated pages to detect illegal memory accesses to
freed pages.

This function set/clear the valid bit using __set_memory(). __set_memory()
acquires init_mm's semaphore, and this operation may sleep. This is
problematic, because __kernel_map_pages() can be called in atomic context,
and thus is illegal to sleep. An example warning that this causes:

BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd
preempt_count: 2, expected: 0
CPU: 0 PID: 2 Comm: kthreadd Not tainted 6.9.0-g1d4c6d784ef6 #37
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff800060dc>] dump_backtrace+0x1c/0x24
[<ffffffff8091ef6e>] show_stack+0x2c/0x38
[<ffffffff8092baf8>] dump_stack_lvl+0x5a/0x72
[<ffffffff8092bb24>] dump_stack+0x14/0x1c
[<ffffffff8003b7ac>] __might_resched+0x104/0x10e
[<ffffffff8003b7f4>] __might_sleep+0x3e/0x62
[<ffffffff8093276a>] down_write+0x20/0x72
[<ffffffff8000cf00>] __set_memory+0x82/0x2fa
[<ffffffff8000d324>] __kernel_map_pages+0x5a/0xd4
[<ffffffff80196cca>] __alloc_pages_bulk+0x3b2/0x43a
[<ffffffff8018ee82>] __vmalloc_node_range+0x196/0x6ba
[<ffffffff80011904>] copy_process+0x72c/0x17ec
[<ffffffff80012ab4>] kernel_clone+0x60/0x2fe
[<ffffffff80012f62>] kernel_thread+0x82/0xa0
[<ffffffff8003552c>] kthreadd+0x14a/0x1be
[<ffffffff809357de>] ret_from_fork+0xe/0x1c

Rewrite this function with apply_to_existing_page_range(). It is fine to
not have any locking, because __kernel_map_pages() works with pages being
allocated/deallocated and those pages are not changed by anyone else in the
meantime.

Fixes: 5fde3db5eb02 ("riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC support")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Cc: stable@vger.kernel.org
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/1289ecba9606a19917bc12b6c27da8aa23e1e5ae.1715750938.git.namcao@linutronix.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
diff fb1cf087 Tue May 14 23:50:40 MDT 2024 Nam Cao <namcao@linutronix.de> riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context

__kernel_map_pages() is a debug function which clears the valid bit in page
table entry for deallocated pages to detect illegal memory accesses to
freed pages.

This function set/clear the valid bit using __set_memory(). __set_memory()
acquires init_mm's semaphore, and this operation may sleep. This is
problematic, because __kernel_map_pages() can be called in atomic context,
and thus is illegal to sleep. An example warning that this causes:

BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd
preempt_count: 2, expected: 0
CPU: 0 PID: 2 Comm: kthreadd Not tainted 6.9.0-g1d4c6d784ef6 #37
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff800060dc>] dump_backtrace+0x1c/0x24
[<ffffffff8091ef6e>] show_stack+0x2c/0x38
[<ffffffff8092baf8>] dump_stack_lvl+0x5a/0x72
[<ffffffff8092bb24>] dump_stack+0x14/0x1c
[<ffffffff8003b7ac>] __might_resched+0x104/0x10e
[<ffffffff8003b7f4>] __might_sleep+0x3e/0x62
[<ffffffff8093276a>] down_write+0x20/0x72
[<ffffffff8000cf00>] __set_memory+0x82/0x2fa
[<ffffffff8000d324>] __kernel_map_pages+0x5a/0xd4
[<ffffffff80196cca>] __alloc_pages_bulk+0x3b2/0x43a
[<ffffffff8018ee82>] __vmalloc_node_range+0x196/0x6ba
[<ffffffff80011904>] copy_process+0x72c/0x17ec
[<ffffffff80012ab4>] kernel_clone+0x60/0x2fe
[<ffffffff80012f62>] kernel_thread+0x82/0xa0
[<ffffffff8003552c>] kthreadd+0x14a/0x1be
[<ffffffff809357de>] ret_from_fork+0xe/0x1c

Rewrite this function with apply_to_existing_page_range(). It is fine to
not have any locking, because __kernel_map_pages() works with pages being
allocated/deallocated and those pages are not changed by anyone else in the
meantime.

Fixes: 5fde3db5eb02 ("riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC support")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Cc: stable@vger.kernel.org
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/1289ecba9606a19917bc12b6c27da8aa23e1e5ae.1715750938.git.namcao@linutronix.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
diff fb1cf087 Tue May 14 23:50:40 MDT 2024 Nam Cao <namcao@linutronix.de> riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context

__kernel_map_pages() is a debug function which clears the valid bit in page
table entry for deallocated pages to detect illegal memory accesses to
freed pages.

This function set/clear the valid bit using __set_memory(). __set_memory()
acquires init_mm's semaphore, and this operation may sleep. This is
problematic, because __kernel_map_pages() can be called in atomic context,
and thus is illegal to sleep. An example warning that this causes:

BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd
preempt_count: 2, expected: 0
CPU: 0 PID: 2 Comm: kthreadd Not tainted 6.9.0-g1d4c6d784ef6 #37
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff800060dc>] dump_backtrace+0x1c/0x24
[<ffffffff8091ef6e>] show_stack+0x2c/0x38
[<ffffffff8092baf8>] dump_stack_lvl+0x5a/0x72
[<ffffffff8092bb24>] dump_stack+0x14/0x1c
[<ffffffff8003b7ac>] __might_resched+0x104/0x10e
[<ffffffff8003b7f4>] __might_sleep+0x3e/0x62
[<ffffffff8093276a>] down_write+0x20/0x72
[<ffffffff8000cf00>] __set_memory+0x82/0x2fa
[<ffffffff8000d324>] __kernel_map_pages+0x5a/0xd4
[<ffffffff80196cca>] __alloc_pages_bulk+0x3b2/0x43a
[<ffffffff8018ee82>] __vmalloc_node_range+0x196/0x6ba
[<ffffffff80011904>] copy_process+0x72c/0x17ec
[<ffffffff80012ab4>] kernel_clone+0x60/0x2fe
[<ffffffff80012f62>] kernel_thread+0x82/0xa0
[<ffffffff8003552c>] kthreadd+0x14a/0x1be
[<ffffffff809357de>] ret_from_fork+0xe/0x1c

Rewrite this function with apply_to_existing_page_range(). It is fine to
not have any locking, because __kernel_map_pages() works with pages being
allocated/deallocated and those pages are not changed by anyone else in the
meantime.

Fixes: 5fde3db5eb02 ("riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC support")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Cc: stable@vger.kernel.org
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/1289ecba9606a19917bc12b6c27da8aa23e1e5ae.1715750938.git.namcao@linutronix.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
diff fb1cf087 Tue May 14 23:50:40 MDT 2024 Nam Cao <namcao@linutronix.de> riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context

__kernel_map_pages() is a debug function which clears the valid bit in page
table entry for deallocated pages to detect illegal memory accesses to
freed pages.

This function set/clear the valid bit using __set_memory(). __set_memory()
acquires init_mm's semaphore, and this operation may sleep. This is
problematic, because __kernel_map_pages() can be called in atomic context,
and thus is illegal to sleep. An example warning that this causes:

BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd
preempt_count: 2, expected: 0
CPU: 0 PID: 2 Comm: kthreadd Not tainted 6.9.0-g1d4c6d784ef6 #37
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff800060dc>] dump_backtrace+0x1c/0x24
[<ffffffff8091ef6e>] show_stack+0x2c/0x38
[<ffffffff8092baf8>] dump_stack_lvl+0x5a/0x72
[<ffffffff8092bb24>] dump_stack+0x14/0x1c
[<ffffffff8003b7ac>] __might_resched+0x104/0x10e
[<ffffffff8003b7f4>] __might_sleep+0x3e/0x62
[<ffffffff8093276a>] down_write+0x20/0x72
[<ffffffff8000cf00>] __set_memory+0x82/0x2fa
[<ffffffff8000d324>] __kernel_map_pages+0x5a/0xd4
[<ffffffff80196cca>] __alloc_pages_bulk+0x3b2/0x43a
[<ffffffff8018ee82>] __vmalloc_node_range+0x196/0x6ba
[<ffffffff80011904>] copy_process+0x72c/0x17ec
[<ffffffff80012ab4>] kernel_clone+0x60/0x2fe
[<ffffffff80012f62>] kernel_thread+0x82/0xa0
[<ffffffff8003552c>] kthreadd+0x14a/0x1be
[<ffffffff809357de>] ret_from_fork+0xe/0x1c

Rewrite this function with apply_to_existing_page_range(). It is fine to
not have any locking, because __kernel_map_pages() works with pages being
allocated/deallocated and those pages are not changed by anyone else in the
meantime.

Fixes: 5fde3db5eb02 ("riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC support")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Cc: stable@vger.kernel.org
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/1289ecba9606a19917bc12b6c27da8aa23e1e5ae.1715750938.git.namcao@linutronix.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
diff fb1cf087 Tue May 14 23:50:40 MDT 2024 Nam Cao <namcao@linutronix.de> riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context

__kernel_map_pages() is a debug function which clears the valid bit in page
table entry for deallocated pages to detect illegal memory accesses to
freed pages.

This function set/clear the valid bit using __set_memory(). __set_memory()
acquires init_mm's semaphore, and this operation may sleep. This is
problematic, because __kernel_map_pages() can be called in atomic context,
and thus is illegal to sleep. An example warning that this causes:

BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd
preempt_count: 2, expected: 0
CPU: 0 PID: 2 Comm: kthreadd Not tainted 6.9.0-g1d4c6d784ef6 #37
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff800060dc>] dump_backtrace+0x1c/0x24
[<ffffffff8091ef6e>] show_stack+0x2c/0x38
[<ffffffff8092baf8>] dump_stack_lvl+0x5a/0x72
[<ffffffff8092bb24>] dump_stack+0x14/0x1c
[<ffffffff8003b7ac>] __might_resched+0x104/0x10e
[<ffffffff8003b7f4>] __might_sleep+0x3e/0x62
[<ffffffff8093276a>] down_write+0x20/0x72
[<ffffffff8000cf00>] __set_memory+0x82/0x2fa
[<ffffffff8000d324>] __kernel_map_pages+0x5a/0xd4
[<ffffffff80196cca>] __alloc_pages_bulk+0x3b2/0x43a
[<ffffffff8018ee82>] __vmalloc_node_range+0x196/0x6ba
[<ffffffff80011904>] copy_process+0x72c/0x17ec
[<ffffffff80012ab4>] kernel_clone+0x60/0x2fe
[<ffffffff80012f62>] kernel_thread+0x82/0xa0
[<ffffffff8003552c>] kthreadd+0x14a/0x1be
[<ffffffff809357de>] ret_from_fork+0xe/0x1c

Rewrite this function with apply_to_existing_page_range(). It is fine to
not have any locking, because __kernel_map_pages() works with pages being
allocated/deallocated and those pages are not changed by anyone else in the
meantime.

Fixes: 5fde3db5eb02 ("riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC support")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Cc: stable@vger.kernel.org
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/1289ecba9606a19917bc12b6c27da8aa23e1e5ae.1715750938.git.namcao@linutronix.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
diff fb1cf087 Tue May 14 23:50:40 MDT 2024 Nam Cao <namcao@linutronix.de> riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context

__kernel_map_pages() is a debug function which clears the valid bit in page
table entry for deallocated pages to detect illegal memory accesses to
freed pages.

This function set/clear the valid bit using __set_memory(). __set_memory()
acquires init_mm's semaphore, and this operation may sleep. This is
problematic, because __kernel_map_pages() can be called in atomic context,
and thus is illegal to sleep. An example warning that this causes:

BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd
preempt_count: 2, expected: 0
CPU: 0 PID: 2 Comm: kthreadd Not tainted 6.9.0-g1d4c6d784ef6 #37
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff800060dc>] dump_backtrace+0x1c/0x24
[<ffffffff8091ef6e>] show_stack+0x2c/0x38
[<ffffffff8092baf8>] dump_stack_lvl+0x5a/0x72
[<ffffffff8092bb24>] dump_stack+0x14/0x1c
[<ffffffff8003b7ac>] __might_resched+0x104/0x10e
[<ffffffff8003b7f4>] __might_sleep+0x3e/0x62
[<ffffffff8093276a>] down_write+0x20/0x72
[<ffffffff8000cf00>] __set_memory+0x82/0x2fa
[<ffffffff8000d324>] __kernel_map_pages+0x5a/0xd4
[<ffffffff80196cca>] __alloc_pages_bulk+0x3b2/0x43a
[<ffffffff8018ee82>] __vmalloc_node_range+0x196/0x6ba
[<ffffffff80011904>] copy_process+0x72c/0x17ec
[<ffffffff80012ab4>] kernel_clone+0x60/0x2fe
[<ffffffff80012f62>] kernel_thread+0x82/0xa0
[<ffffffff8003552c>] kthreadd+0x14a/0x1be
[<ffffffff809357de>] ret_from_fork+0xe/0x1c

Rewrite this function with apply_to_existing_page_range(). It is fine to
not have any locking, because __kernel_map_pages() works with pages being
allocated/deallocated and those pages are not changed by anyone else in the
meantime.

Fixes: 5fde3db5eb02 ("riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC support")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Cc: stable@vger.kernel.org
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/1289ecba9606a19917bc12b6c27da8aa23e1e5ae.1715750938.git.namcao@linutronix.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
diff fb1cf087 Tue May 14 23:50:40 MDT 2024 Nam Cao <namcao@linutronix.de> riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context

__kernel_map_pages() is a debug function which clears the valid bit in page
table entry for deallocated pages to detect illegal memory accesses to
freed pages.

This function set/clear the valid bit using __set_memory(). __set_memory()
acquires init_mm's semaphore, and this operation may sleep. This is
problematic, because __kernel_map_pages() can be called in atomic context,
and thus is illegal to sleep. An example warning that this causes:

BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd
preempt_count: 2, expected: 0
CPU: 0 PID: 2 Comm: kthreadd Not tainted 6.9.0-g1d4c6d784ef6 #37
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff800060dc>] dump_backtrace+0x1c/0x24
[<ffffffff8091ef6e>] show_stack+0x2c/0x38
[<ffffffff8092baf8>] dump_stack_lvl+0x5a/0x72
[<ffffffff8092bb24>] dump_stack+0x14/0x1c
[<ffffffff8003b7ac>] __might_resched+0x104/0x10e
[<ffffffff8003b7f4>] __might_sleep+0x3e/0x62
[<ffffffff8093276a>] down_write+0x20/0x72
[<ffffffff8000cf00>] __set_memory+0x82/0x2fa
[<ffffffff8000d324>] __kernel_map_pages+0x5a/0xd4
[<ffffffff80196cca>] __alloc_pages_bulk+0x3b2/0x43a
[<ffffffff8018ee82>] __vmalloc_node_range+0x196/0x6ba
[<ffffffff80011904>] copy_process+0x72c/0x17ec
[<ffffffff80012ab4>] kernel_clone+0x60/0x2fe
[<ffffffff80012f62>] kernel_thread+0x82/0xa0
[<ffffffff8003552c>] kthreadd+0x14a/0x1be
[<ffffffff809357de>] ret_from_fork+0xe/0x1c

Rewrite this function with apply_to_existing_page_range(). It is fine to
not have any locking, because __kernel_map_pages() works with pages being
allocated/deallocated and those pages are not changed by anyone else in the
meantime.

Fixes: 5fde3db5eb02 ("riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC support")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Cc: stable@vger.kernel.org
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/1289ecba9606a19917bc12b6c27da8aa23e1e5ae.1715750938.git.namcao@linutronix.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
diff fb1cf087 Tue May 14 23:50:40 MDT 2024 Nam Cao <namcao@linutronix.de> riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context

__kernel_map_pages() is a debug function which clears the valid bit in page
table entry for deallocated pages to detect illegal memory accesses to
freed pages.

This function set/clear the valid bit using __set_memory(). __set_memory()
acquires init_mm's semaphore, and this operation may sleep. This is
problematic, because __kernel_map_pages() can be called in atomic context,
and thus is illegal to sleep. An example warning that this causes:

BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd
preempt_count: 2, expected: 0
CPU: 0 PID: 2 Comm: kthreadd Not tainted 6.9.0-g1d4c6d784ef6 #37
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff800060dc>] dump_backtrace+0x1c/0x24
[<ffffffff8091ef6e>] show_stack+0x2c/0x38
[<ffffffff8092baf8>] dump_stack_lvl+0x5a/0x72
[<ffffffff8092bb24>] dump_stack+0x14/0x1c
[<ffffffff8003b7ac>] __might_resched+0x104/0x10e
[<ffffffff8003b7f4>] __might_sleep+0x3e/0x62
[<ffffffff8093276a>] down_write+0x20/0x72
[<ffffffff8000cf00>] __set_memory+0x82/0x2fa
[<ffffffff8000d324>] __kernel_map_pages+0x5a/0xd4
[<ffffffff80196cca>] __alloc_pages_bulk+0x3b2/0x43a
[<ffffffff8018ee82>] __vmalloc_node_range+0x196/0x6ba
[<ffffffff80011904>] copy_process+0x72c/0x17ec
[<ffffffff80012ab4>] kernel_clone+0x60/0x2fe
[<ffffffff80012f62>] kernel_thread+0x82/0xa0
[<ffffffff8003552c>] kthreadd+0x14a/0x1be
[<ffffffff809357de>] ret_from_fork+0xe/0x1c

Rewrite this function with apply_to_existing_page_range(). It is fine to
not have any locking, because __kernel_map_pages() works with pages being
allocated/deallocated and those pages are not changed by anyone else in the
meantime.

Fixes: 5fde3db5eb02 ("riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC support")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Cc: stable@vger.kernel.org
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/1289ecba9606a19917bc12b6c27da8aa23e1e5ae.1715750938.git.namcao@linutronix.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
diff fb1cf087 Tue May 14 23:50:40 MDT 2024 Nam Cao <namcao@linutronix.de> riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context

__kernel_map_pages() is a debug function which clears the valid bit in page
table entry for deallocated pages to detect illegal memory accesses to
freed pages.

This function set/clear the valid bit using __set_memory(). __set_memory()
acquires init_mm's semaphore, and this operation may sleep. This is
problematic, because __kernel_map_pages() can be called in atomic context,
and thus is illegal to sleep. An example warning that this causes:

BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd
preempt_count: 2, expected: 0
CPU: 0 PID: 2 Comm: kthreadd Not tainted 6.9.0-g1d4c6d784ef6 #37
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff800060dc>] dump_backtrace+0x1c/0x24
[<ffffffff8091ef6e>] show_stack+0x2c/0x38
[<ffffffff8092baf8>] dump_stack_lvl+0x5a/0x72
[<ffffffff8092bb24>] dump_stack+0x14/0x1c
[<ffffffff8003b7ac>] __might_resched+0x104/0x10e
[<ffffffff8003b7f4>] __might_sleep+0x3e/0x62
[<ffffffff8093276a>] down_write+0x20/0x72
[<ffffffff8000cf00>] __set_memory+0x82/0x2fa
[<ffffffff8000d324>] __kernel_map_pages+0x5a/0xd4
[<ffffffff80196cca>] __alloc_pages_bulk+0x3b2/0x43a
[<ffffffff8018ee82>] __vmalloc_node_range+0x196/0x6ba
[<ffffffff80011904>] copy_process+0x72c/0x17ec
[<ffffffff80012ab4>] kernel_clone+0x60/0x2fe
[<ffffffff80012f62>] kernel_thread+0x82/0xa0
[<ffffffff8003552c>] kthreadd+0x14a/0x1be
[<ffffffff809357de>] ret_from_fork+0xe/0x1c

Rewrite this function with apply_to_existing_page_range(). It is fine to
not have any locking, because __kernel_map_pages() works with pages being
allocated/deallocated and those pages are not changed by anyone else in the
meantime.

Fixes: 5fde3db5eb02 ("riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC support")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Cc: stable@vger.kernel.org
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/1289ecba9606a19917bc12b6c27da8aa23e1e5ae.1715750938.git.namcao@linutronix.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
diff fb1cf087 Tue May 14 23:50:40 MDT 2024 Nam Cao <namcao@linutronix.de> riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context

__kernel_map_pages() is a debug function which clears the valid bit in page
table entry for deallocated pages to detect illegal memory accesses to
freed pages.

This function set/clear the valid bit using __set_memory(). __set_memory()
acquires init_mm's semaphore, and this operation may sleep. This is
problematic, because __kernel_map_pages() can be called in atomic context,
and thus is illegal to sleep. An example warning that this causes:

BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd
preempt_count: 2, expected: 0
CPU: 0 PID: 2 Comm: kthreadd Not tainted 6.9.0-g1d4c6d784ef6 #37
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff800060dc>] dump_backtrace+0x1c/0x24
[<ffffffff8091ef6e>] show_stack+0x2c/0x38
[<ffffffff8092baf8>] dump_stack_lvl+0x5a/0x72
[<ffffffff8092bb24>] dump_stack+0x14/0x1c
[<ffffffff8003b7ac>] __might_resched+0x104/0x10e
[<ffffffff8003b7f4>] __might_sleep+0x3e/0x62
[<ffffffff8093276a>] down_write+0x20/0x72
[<ffffffff8000cf00>] __set_memory+0x82/0x2fa
[<ffffffff8000d324>] __kernel_map_pages+0x5a/0xd4
[<ffffffff80196cca>] __alloc_pages_bulk+0x3b2/0x43a
[<ffffffff8018ee82>] __vmalloc_node_range+0x196/0x6ba
[<ffffffff80011904>] copy_process+0x72c/0x17ec
[<ffffffff80012ab4>] kernel_clone+0x60/0x2fe
[<ffffffff80012f62>] kernel_thread+0x82/0xa0
[<ffffffff8003552c>] kthreadd+0x14a/0x1be
[<ffffffff809357de>] ret_from_fork+0xe/0x1c

Rewrite this function with apply_to_existing_page_range(). It is fine to
not have any locking, because __kernel_map_pages() works with pages being
allocated/deallocated and those pages are not changed by anyone else in the
meantime.

Fixes: 5fde3db5eb02 ("riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC support")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Cc: stable@vger.kernel.org
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/1289ecba9606a19917bc12b6c27da8aa23e1e5ae.1715750938.git.namcao@linutronix.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
diff fb1cf087 Tue May 14 23:50:40 MDT 2024 Nam Cao <namcao@linutronix.de> riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context

__kernel_map_pages() is a debug function which clears the valid bit in page
table entry for deallocated pages to detect illegal memory accesses to
freed pages.

This function set/clear the valid bit using __set_memory(). __set_memory()
acquires init_mm's semaphore, and this operation may sleep. This is
problematic, because __kernel_map_pages() can be called in atomic context,
and thus is illegal to sleep. An example warning that this causes:

BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd
preempt_count: 2, expected: 0
CPU: 0 PID: 2 Comm: kthreadd Not tainted 6.9.0-g1d4c6d784ef6 #37
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff800060dc>] dump_backtrace+0x1c/0x24
[<ffffffff8091ef6e>] show_stack+0x2c/0x38
[<ffffffff8092baf8>] dump_stack_lvl+0x5a/0x72
[<ffffffff8092bb24>] dump_stack+0x14/0x1c
[<ffffffff8003b7ac>] __might_resched+0x104/0x10e
[<ffffffff8003b7f4>] __might_sleep+0x3e/0x62
[<ffffffff8093276a>] down_write+0x20/0x72
[<ffffffff8000cf00>] __set_memory+0x82/0x2fa
[<ffffffff8000d324>] __kernel_map_pages+0x5a/0xd4
[<ffffffff80196cca>] __alloc_pages_bulk+0x3b2/0x43a
[<ffffffff8018ee82>] __vmalloc_node_range+0x196/0x6ba
[<ffffffff80011904>] copy_process+0x72c/0x17ec
[<ffffffff80012ab4>] kernel_clone+0x60/0x2fe
[<ffffffff80012f62>] kernel_thread+0x82/0xa0
[<ffffffff8003552c>] kthreadd+0x14a/0x1be
[<ffffffff809357de>] ret_from_fork+0xe/0x1c

Rewrite this function with apply_to_existing_page_range(). It is fine to
not have any locking, because __kernel_map_pages() works with pages being
allocated/deallocated and those pages are not changed by anyone else in the
meantime.

Fixes: 5fde3db5eb02 ("riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC support")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Cc: stable@vger.kernel.org
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/1289ecba9606a19917bc12b6c27da8aa23e1e5ae.1715750938.git.namcao@linutronix.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
diff fb1cf087 Tue May 14 23:50:40 MDT 2024 Nam Cao <namcao@linutronix.de> riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context

__kernel_map_pages() is a debug function which clears the valid bit in page
table entry for deallocated pages to detect illegal memory accesses to
freed pages.

This function set/clear the valid bit using __set_memory(). __set_memory()
acquires init_mm's semaphore, and this operation may sleep. This is
problematic, because __kernel_map_pages() can be called in atomic context,
and thus is illegal to sleep. An example warning that this causes:

BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd
preempt_count: 2, expected: 0
CPU: 0 PID: 2 Comm: kthreadd Not tainted 6.9.0-g1d4c6d784ef6 #37
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff800060dc>] dump_backtrace+0x1c/0x24
[<ffffffff8091ef6e>] show_stack+0x2c/0x38
[<ffffffff8092baf8>] dump_stack_lvl+0x5a/0x72
[<ffffffff8092bb24>] dump_stack+0x14/0x1c
[<ffffffff8003b7ac>] __might_resched+0x104/0x10e
[<ffffffff8003b7f4>] __might_sleep+0x3e/0x62
[<ffffffff8093276a>] down_write+0x20/0x72
[<ffffffff8000cf00>] __set_memory+0x82/0x2fa
[<ffffffff8000d324>] __kernel_map_pages+0x5a/0xd4
[<ffffffff80196cca>] __alloc_pages_bulk+0x3b2/0x43a
[<ffffffff8018ee82>] __vmalloc_node_range+0x196/0x6ba
[<ffffffff80011904>] copy_process+0x72c/0x17ec
[<ffffffff80012ab4>] kernel_clone+0x60/0x2fe
[<ffffffff80012f62>] kernel_thread+0x82/0xa0
[<ffffffff8003552c>] kthreadd+0x14a/0x1be
[<ffffffff809357de>] ret_from_fork+0xe/0x1c

Rewrite this function with apply_to_existing_page_range(). It is fine to
not have any locking, because __kernel_map_pages() works with pages being
allocated/deallocated and those pages are not changed by anyone else in the
meantime.

Fixes: 5fde3db5eb02 ("riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC support")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Cc: stable@vger.kernel.org
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/1289ecba9606a19917bc12b6c27da8aa23e1e5ae.1715750938.git.namcao@linutronix.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
diff fb1cf087 Tue May 14 23:50:40 MDT 2024 Nam Cao <namcao@linutronix.de> riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context

__kernel_map_pages() is a debug function which clears the valid bit in page
table entry for deallocated pages to detect illegal memory accesses to
freed pages.

This function set/clear the valid bit using __set_memory(). __set_memory()
acquires init_mm's semaphore, and this operation may sleep. This is
problematic, because __kernel_map_pages() can be called in atomic context,
and thus is illegal to sleep. An example warning that this causes:

BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd
preempt_count: 2, expected: 0
CPU: 0 PID: 2 Comm: kthreadd Not tainted 6.9.0-g1d4c6d784ef6 #37
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff800060dc>] dump_backtrace+0x1c/0x24
[<ffffffff8091ef6e>] show_stack+0x2c/0x38
[<ffffffff8092baf8>] dump_stack_lvl+0x5a/0x72
[<ffffffff8092bb24>] dump_stack+0x14/0x1c
[<ffffffff8003b7ac>] __might_resched+0x104/0x10e
[<ffffffff8003b7f4>] __might_sleep+0x3e/0x62
[<ffffffff8093276a>] down_write+0x20/0x72
[<ffffffff8000cf00>] __set_memory+0x82/0x2fa
[<ffffffff8000d324>] __kernel_map_pages+0x5a/0xd4
[<ffffffff80196cca>] __alloc_pages_bulk+0x3b2/0x43a
[<ffffffff8018ee82>] __vmalloc_node_range+0x196/0x6ba
[<ffffffff80011904>] copy_process+0x72c/0x17ec
[<ffffffff80012ab4>] kernel_clone+0x60/0x2fe
[<ffffffff80012f62>] kernel_thread+0x82/0xa0
[<ffffffff8003552c>] kthreadd+0x14a/0x1be
[<ffffffff809357de>] ret_from_fork+0xe/0x1c

Rewrite this function with apply_to_existing_page_range(). It is fine to
not have any locking, because __kernel_map_pages() works with pages being
allocated/deallocated and those pages are not changed by anyone else in the
meantime.

Fixes: 5fde3db5eb02 ("riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC support")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Cc: stable@vger.kernel.org
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/1289ecba9606a19917bc12b6c27da8aa23e1e5ae.1715750938.git.namcao@linutronix.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
diff fb1cf087 Tue May 14 23:50:40 MDT 2024 Nam Cao <namcao@linutronix.de> riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context

__kernel_map_pages() is a debug function which clears the valid bit in page
table entry for deallocated pages to detect illegal memory accesses to
freed pages.

This function set/clear the valid bit using __set_memory(). __set_memory()
acquires init_mm's semaphore, and this operation may sleep. This is
problematic, because __kernel_map_pages() can be called in atomic context,
and thus is illegal to sleep. An example warning that this causes:

BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd
preempt_count: 2, expected: 0
CPU: 0 PID: 2 Comm: kthreadd Not tainted 6.9.0-g1d4c6d784ef6 #37
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff800060dc>] dump_backtrace+0x1c/0x24
[<ffffffff8091ef6e>] show_stack+0x2c/0x38
[<ffffffff8092baf8>] dump_stack_lvl+0x5a/0x72
[<ffffffff8092bb24>] dump_stack+0x14/0x1c
[<ffffffff8003b7ac>] __might_resched+0x104/0x10e
[<ffffffff8003b7f4>] __might_sleep+0x3e/0x62
[<ffffffff8093276a>] down_write+0x20/0x72
[<ffffffff8000cf00>] __set_memory+0x82/0x2fa
[<ffffffff8000d324>] __kernel_map_pages+0x5a/0xd4
[<ffffffff80196cca>] __alloc_pages_bulk+0x3b2/0x43a
[<ffffffff8018ee82>] __vmalloc_node_range+0x196/0x6ba
[<ffffffff80011904>] copy_process+0x72c/0x17ec
[<ffffffff80012ab4>] kernel_clone+0x60/0x2fe
[<ffffffff80012f62>] kernel_thread+0x82/0xa0
[<ffffffff8003552c>] kthreadd+0x14a/0x1be
[<ffffffff809357de>] ret_from_fork+0xe/0x1c

Rewrite this function with apply_to_existing_page_range(). It is fine to
not have any locking, because __kernel_map_pages() works with pages being
allocated/deallocated and those pages are not changed by anyone else in the
meantime.

Fixes: 5fde3db5eb02 ("riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC support")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Cc: stable@vger.kernel.org
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/1289ecba9606a19917bc12b6c27da8aa23e1e5ae.1715750938.git.namcao@linutronix.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
diff fb1cf087 Tue May 14 23:50:40 MDT 2024 Nam Cao <namcao@linutronix.de> riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context

__kernel_map_pages() is a debug function which clears the valid bit in page
table entry for deallocated pages to detect illegal memory accesses to
freed pages.

This function set/clear the valid bit using __set_memory(). __set_memory()
acquires init_mm's semaphore, and this operation may sleep. This is
problematic, because __kernel_map_pages() can be called in atomic context,
and thus is illegal to sleep. An example warning that this causes:

BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd
preempt_count: 2, expected: 0
CPU: 0 PID: 2 Comm: kthreadd Not tainted 6.9.0-g1d4c6d784ef6 #37
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff800060dc>] dump_backtrace+0x1c/0x24
[<ffffffff8091ef6e>] show_stack+0x2c/0x38
[<ffffffff8092baf8>] dump_stack_lvl+0x5a/0x72
[<ffffffff8092bb24>] dump_stack+0x14/0x1c
[<ffffffff8003b7ac>] __might_resched+0x104/0x10e
[<ffffffff8003b7f4>] __might_sleep+0x3e/0x62
[<ffffffff8093276a>] down_write+0x20/0x72
[<ffffffff8000cf00>] __set_memory+0x82/0x2fa
[<ffffffff8000d324>] __kernel_map_pages+0x5a/0xd4
[<ffffffff80196cca>] __alloc_pages_bulk+0x3b2/0x43a
[<ffffffff8018ee82>] __vmalloc_node_range+0x196/0x6ba
[<ffffffff80011904>] copy_process+0x72c/0x17ec
[<ffffffff80012ab4>] kernel_clone+0x60/0x2fe
[<ffffffff80012f62>] kernel_thread+0x82/0xa0
[<ffffffff8003552c>] kthreadd+0x14a/0x1be
[<ffffffff809357de>] ret_from_fork+0xe/0x1c

Rewrite this function with apply_to_existing_page_range(). It is fine to
not have any locking, because __kernel_map_pages() works with pages being
allocated/deallocated and those pages are not changed by anyone else in the
meantime.

Fixes: 5fde3db5eb02 ("riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC support")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Cc: stable@vger.kernel.org
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/1289ecba9606a19917bc12b6c27da8aa23e1e5ae.1715750938.git.namcao@linutronix.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
diff fb1cf087 Tue May 14 23:50:40 MDT 2024 Nam Cao <namcao@linutronix.de> riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context

__kernel_map_pages() is a debug function which clears the valid bit in page
table entry for deallocated pages to detect illegal memory accesses to
freed pages.

This function set/clear the valid bit using __set_memory(). __set_memory()
acquires init_mm's semaphore, and this operation may sleep. This is
problematic, because __kernel_map_pages() can be called in atomic context,
and thus is illegal to sleep. An example warning that this causes:

BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd
preempt_count: 2, expected: 0
CPU: 0 PID: 2 Comm: kthreadd Not tainted 6.9.0-g1d4c6d784ef6 #37
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff800060dc>] dump_backtrace+0x1c/0x24
[<ffffffff8091ef6e>] show_stack+0x2c/0x38
[<ffffffff8092baf8>] dump_stack_lvl+0x5a/0x72
[<ffffffff8092bb24>] dump_stack+0x14/0x1c
[<ffffffff8003b7ac>] __might_resched+0x104/0x10e
[<ffffffff8003b7f4>] __might_sleep+0x3e/0x62
[<ffffffff8093276a>] down_write+0x20/0x72
[<ffffffff8000cf00>] __set_memory+0x82/0x2fa
[<ffffffff8000d324>] __kernel_map_pages+0x5a/0xd4
[<ffffffff80196cca>] __alloc_pages_bulk+0x3b2/0x43a
[<ffffffff8018ee82>] __vmalloc_node_range+0x196/0x6ba
[<ffffffff80011904>] copy_process+0x72c/0x17ec
[<ffffffff80012ab4>] kernel_clone+0x60/0x2fe
[<ffffffff80012f62>] kernel_thread+0x82/0xa0
[<ffffffff8003552c>] kthreadd+0x14a/0x1be
[<ffffffff809357de>] ret_from_fork+0xe/0x1c

Rewrite this function with apply_to_existing_page_range(). It is fine to
not have any locking, because __kernel_map_pages() works with pages being
allocated/deallocated and those pages are not changed by anyone else in the
meantime.

Fixes: 5fde3db5eb02 ("riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC support")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Cc: stable@vger.kernel.org
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/1289ecba9606a19917bc12b6c27da8aa23e1e5ae.1715750938.git.namcao@linutronix.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
diff fb1cf087 Tue May 14 23:50:40 MDT 2024 Nam Cao <namcao@linutronix.de> riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context

__kernel_map_pages() is a debug function which clears the valid bit in page
table entry for deallocated pages to detect illegal memory accesses to
freed pages.

This function set/clear the valid bit using __set_memory(). __set_memory()
acquires init_mm's semaphore, and this operation may sleep. This is
problematic, because __kernel_map_pages() can be called in atomic context,
and thus is illegal to sleep. An example warning that this causes:

BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd
preempt_count: 2, expected: 0
CPU: 0 PID: 2 Comm: kthreadd Not tainted 6.9.0-g1d4c6d784ef6 #37
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff800060dc>] dump_backtrace+0x1c/0x24
[<ffffffff8091ef6e>] show_stack+0x2c/0x38
[<ffffffff8092baf8>] dump_stack_lvl+0x5a/0x72
[<ffffffff8092bb24>] dump_stack+0x14/0x1c
[<ffffffff8003b7ac>] __might_resched+0x104/0x10e
[<ffffffff8003b7f4>] __might_sleep+0x3e/0x62
[<ffffffff8093276a>] down_write+0x20/0x72
[<ffffffff8000cf00>] __set_memory+0x82/0x2fa
[<ffffffff8000d324>] __kernel_map_pages+0x5a/0xd4
[<ffffffff80196cca>] __alloc_pages_bulk+0x3b2/0x43a
[<ffffffff8018ee82>] __vmalloc_node_range+0x196/0x6ba
[<ffffffff80011904>] copy_process+0x72c/0x17ec
[<ffffffff80012ab4>] kernel_clone+0x60/0x2fe
[<ffffffff80012f62>] kernel_thread+0x82/0xa0
[<ffffffff8003552c>] kthreadd+0x14a/0x1be
[<ffffffff809357de>] ret_from_fork+0xe/0x1c

Rewrite this function with apply_to_existing_page_range(). It is fine to
not have any locking, because __kernel_map_pages() works with pages being
allocated/deallocated and those pages are not changed by anyone else in the
meantime.

Fixes: 5fde3db5eb02 ("riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC support")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Cc: stable@vger.kernel.org
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/1289ecba9606a19917bc12b6c27da8aa23e1e5ae.1715750938.git.namcao@linutronix.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
diff fb1cf087 Tue May 14 23:50:40 MDT 2024 Nam Cao <namcao@linutronix.de> riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context

__kernel_map_pages() is a debug function which clears the valid bit in page
table entry for deallocated pages to detect illegal memory accesses to
freed pages.

This function set/clear the valid bit using __set_memory(). __set_memory()
acquires init_mm's semaphore, and this operation may sleep. This is
problematic, because __kernel_map_pages() can be called in atomic context,
and thus is illegal to sleep. An example warning that this causes:

BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd
preempt_count: 2, expected: 0
CPU: 0 PID: 2 Comm: kthreadd Not tainted 6.9.0-g1d4c6d784ef6 #37
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff800060dc>] dump_backtrace+0x1c/0x24
[<ffffffff8091ef6e>] show_stack+0x2c/0x38
[<ffffffff8092baf8>] dump_stack_lvl+0x5a/0x72
[<ffffffff8092bb24>] dump_stack+0x14/0x1c
[<ffffffff8003b7ac>] __might_resched+0x104/0x10e
[<ffffffff8003b7f4>] __might_sleep+0x3e/0x62
[<ffffffff8093276a>] down_write+0x20/0x72
[<ffffffff8000cf00>] __set_memory+0x82/0x2fa
[<ffffffff8000d324>] __kernel_map_pages+0x5a/0xd4
[<ffffffff80196cca>] __alloc_pages_bulk+0x3b2/0x43a
[<ffffffff8018ee82>] __vmalloc_node_range+0x196/0x6ba
[<ffffffff80011904>] copy_process+0x72c/0x17ec
[<ffffffff80012ab4>] kernel_clone+0x60/0x2fe
[<ffffffff80012f62>] kernel_thread+0x82/0xa0
[<ffffffff8003552c>] kthreadd+0x14a/0x1be
[<ffffffff809357de>] ret_from_fork+0xe/0x1c

Rewrite this function with apply_to_existing_page_range(). It is fine to
not have any locking, because __kernel_map_pages() works with pages being
allocated/deallocated and those pages are not changed by anyone else in the
meantime.

Fixes: 5fde3db5eb02 ("riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC support")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Cc: stable@vger.kernel.org
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/1289ecba9606a19917bc12b6c27da8aa23e1e5ae.1715750938.git.namcao@linutronix.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
diff fb1cf087 Tue May 14 23:50:40 MDT 2024 Nam Cao <namcao@linutronix.de> riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context

__kernel_map_pages() is a debug function which clears the valid bit in page
table entry for deallocated pages to detect illegal memory accesses to
freed pages.

This function set/clear the valid bit using __set_memory(). __set_memory()
acquires init_mm's semaphore, and this operation may sleep. This is
problematic, because __kernel_map_pages() can be called in atomic context,
and thus is illegal to sleep. An example warning that this causes:

BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd
preempt_count: 2, expected: 0
CPU: 0 PID: 2 Comm: kthreadd Not tainted 6.9.0-g1d4c6d784ef6 #37
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff800060dc>] dump_backtrace+0x1c/0x24
[<ffffffff8091ef6e>] show_stack+0x2c/0x38
[<ffffffff8092baf8>] dump_stack_lvl+0x5a/0x72
[<ffffffff8092bb24>] dump_stack+0x14/0x1c
[<ffffffff8003b7ac>] __might_resched+0x104/0x10e
[<ffffffff8003b7f4>] __might_sleep+0x3e/0x62
[<ffffffff8093276a>] down_write+0x20/0x72
[<ffffffff8000cf00>] __set_memory+0x82/0x2fa
[<ffffffff8000d324>] __kernel_map_pages+0x5a/0xd4
[<ffffffff80196cca>] __alloc_pages_bulk+0x3b2/0x43a
[<ffffffff8018ee82>] __vmalloc_node_range+0x196/0x6ba
[<ffffffff80011904>] copy_process+0x72c/0x17ec
[<ffffffff80012ab4>] kernel_clone+0x60/0x2fe
[<ffffffff80012f62>] kernel_thread+0x82/0xa0
[<ffffffff8003552c>] kthreadd+0x14a/0x1be
[<ffffffff809357de>] ret_from_fork+0xe/0x1c

Rewrite this function with apply_to_existing_page_range(). It is fine to
not have any locking, because __kernel_map_pages() works with pages being
allocated/deallocated and those pages are not changed by anyone else in the
meantime.

Fixes: 5fde3db5eb02 ("riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC support")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Cc: stable@vger.kernel.org
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/1289ecba9606a19917bc12b6c27da8aa23e1e5ae.1715750938.git.namcao@linutronix.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
diff fb1cf087 Tue May 14 23:50:40 MDT 2024 Nam Cao <namcao@linutronix.de> riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context

__kernel_map_pages() is a debug function which clears the valid bit in page
table entry for deallocated pages to detect illegal memory accesses to
freed pages.

This function set/clear the valid bit using __set_memory(). __set_memory()
acquires init_mm's semaphore, and this operation may sleep. This is
problematic, because __kernel_map_pages() can be called in atomic context,
and thus is illegal to sleep. An example warning that this causes:

BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd
preempt_count: 2, expected: 0
CPU: 0 PID: 2 Comm: kthreadd Not tainted 6.9.0-g1d4c6d784ef6 #37
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff800060dc>] dump_backtrace+0x1c/0x24
[<ffffffff8091ef6e>] show_stack+0x2c/0x38
[<ffffffff8092baf8>] dump_stack_lvl+0x5a/0x72
[<ffffffff8092bb24>] dump_stack+0x14/0x1c
[<ffffffff8003b7ac>] __might_resched+0x104/0x10e
[<ffffffff8003b7f4>] __might_sleep+0x3e/0x62
[<ffffffff8093276a>] down_write+0x20/0x72
[<ffffffff8000cf00>] __set_memory+0x82/0x2fa
[<ffffffff8000d324>] __kernel_map_pages+0x5a/0xd4
[<ffffffff80196cca>] __alloc_pages_bulk+0x3b2/0x43a
[<ffffffff8018ee82>] __vmalloc_node_range+0x196/0x6ba
[<ffffffff80011904>] copy_process+0x72c/0x17ec
[<ffffffff80012ab4>] kernel_clone+0x60/0x2fe
[<ffffffff80012f62>] kernel_thread+0x82/0xa0
[<ffffffff8003552c>] kthreadd+0x14a/0x1be
[<ffffffff809357de>] ret_from_fork+0xe/0x1c

Rewrite this function with apply_to_existing_page_range(). It is fine to
not have any locking, because __kernel_map_pages() works with pages being
allocated/deallocated and those pages are not changed by anyone else in the
meantime.

Fixes: 5fde3db5eb02 ("riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC support")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Cc: stable@vger.kernel.org
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/1289ecba9606a19917bc12b6c27da8aa23e1e5ae.1715750938.git.namcao@linutronix.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
diff fb1cf087 Tue May 14 23:50:40 MDT 2024 Nam Cao <namcao@linutronix.de> riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context

__kernel_map_pages() is a debug function which clears the valid bit in page
table entry for deallocated pages to detect illegal memory accesses to
freed pages.

This function set/clear the valid bit using __set_memory(). __set_memory()
acquires init_mm's semaphore, and this operation may sleep. This is
problematic, because __kernel_map_pages() can be called in atomic context,
and thus is illegal to sleep. An example warning that this causes:

BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd
preempt_count: 2, expected: 0
CPU: 0 PID: 2 Comm: kthreadd Not tainted 6.9.0-g1d4c6d784ef6 #37
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff800060dc>] dump_backtrace+0x1c/0x24
[<ffffffff8091ef6e>] show_stack+0x2c/0x38
[<ffffffff8092baf8>] dump_stack_lvl+0x5a/0x72
[<ffffffff8092bb24>] dump_stack+0x14/0x1c
[<ffffffff8003b7ac>] __might_resched+0x104/0x10e
[<ffffffff8003b7f4>] __might_sleep+0x3e/0x62
[<ffffffff8093276a>] down_write+0x20/0x72
[<ffffffff8000cf00>] __set_memory+0x82/0x2fa
[<ffffffff8000d324>] __kernel_map_pages+0x5a/0xd4
[<ffffffff80196cca>] __alloc_pages_bulk+0x3b2/0x43a
[<ffffffff8018ee82>] __vmalloc_node_range+0x196/0x6ba
[<ffffffff80011904>] copy_process+0x72c/0x17ec
[<ffffffff80012ab4>] kernel_clone+0x60/0x2fe
[<ffffffff80012f62>] kernel_thread+0x82/0xa0
[<ffffffff8003552c>] kthreadd+0x14a/0x1be
[<ffffffff809357de>] ret_from_fork+0xe/0x1c

Rewrite this function with apply_to_existing_page_range(). It is fine to
not have any locking, because __kernel_map_pages() works with pages being
allocated/deallocated and those pages are not changed by anyone else in the
meantime.

Fixes: 5fde3db5eb02 ("riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC support")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Cc: stable@vger.kernel.org
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/1289ecba9606a19917bc12b6c27da8aa23e1e5ae.1715750938.git.namcao@linutronix.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
diff fb1cf087 Tue May 14 23:50:40 MDT 2024 Nam Cao <namcao@linutronix.de> riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context

__kernel_map_pages() is a debug function which clears the valid bit in page
table entry for deallocated pages to detect illegal memory accesses to
freed pages.

This function set/clear the valid bit using __set_memory(). __set_memory()
acquires init_mm's semaphore, and this operation may sleep. This is
problematic, because __kernel_map_pages() can be called in atomic context,
and thus is illegal to sleep. An example warning that this causes:

BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd
preempt_count: 2, expected: 0
CPU: 0 PID: 2 Comm: kthreadd Not tainted 6.9.0-g1d4c6d784ef6 #37
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff800060dc>] dump_backtrace+0x1c/0x24
[<ffffffff8091ef6e>] show_stack+0x2c/0x38
[<ffffffff8092baf8>] dump_stack_lvl+0x5a/0x72
[<ffffffff8092bb24>] dump_stack+0x14/0x1c
[<ffffffff8003b7ac>] __might_resched+0x104/0x10e
[<ffffffff8003b7f4>] __might_sleep+0x3e/0x62
[<ffffffff8093276a>] down_write+0x20/0x72
[<ffffffff8000cf00>] __set_memory+0x82/0x2fa
[<ffffffff8000d324>] __kernel_map_pages+0x5a/0xd4
[<ffffffff80196cca>] __alloc_pages_bulk+0x3b2/0x43a
[<ffffffff8018ee82>] __vmalloc_node_range+0x196/0x6ba
[<ffffffff80011904>] copy_process+0x72c/0x17ec
[<ffffffff80012ab4>] kernel_clone+0x60/0x2fe
[<ffffffff80012f62>] kernel_thread+0x82/0xa0
[<ffffffff8003552c>] kthreadd+0x14a/0x1be
[<ffffffff809357de>] ret_from_fork+0xe/0x1c

Rewrite this function with apply_to_existing_page_range(). It is fine to
not have any locking, because __kernel_map_pages() works with pages being
allocated/deallocated and those pages are not changed by anyone else in the
meantime.

Fixes: 5fde3db5eb02 ("riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC support")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Cc: stable@vger.kernel.org
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/1289ecba9606a19917bc12b6c27da8aa23e1e5ae.1715750938.git.namcao@linutronix.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
diff fb1cf087 Tue May 14 23:50:40 MDT 2024 Nam Cao <namcao@linutronix.de> riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context

__kernel_map_pages() is a debug function which clears the valid bit in page
table entry for deallocated pages to detect illegal memory accesses to
freed pages.

This function set/clear the valid bit using __set_memory(). __set_memory()
acquires init_mm's semaphore, and this operation may sleep. This is
problematic, because __kernel_map_pages() can be called in atomic context,
and thus is illegal to sleep. An example warning that this causes:

BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd
preempt_count: 2, expected: 0
CPU: 0 PID: 2 Comm: kthreadd Not tainted 6.9.0-g1d4c6d784ef6 #37
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff800060dc>] dump_backtrace+0x1c/0x24
[<ffffffff8091ef6e>] show_stack+0x2c/0x38
[<ffffffff8092baf8>] dump_stack_lvl+0x5a/0x72
[<ffffffff8092bb24>] dump_stack+0x14/0x1c
[<ffffffff8003b7ac>] __might_resched+0x104/0x10e
[<ffffffff8003b7f4>] __might_sleep+0x3e/0x62
[<ffffffff8093276a>] down_write+0x20/0x72
[<ffffffff8000cf00>] __set_memory+0x82/0x2fa
[<ffffffff8000d324>] __kernel_map_pages+0x5a/0xd4
[<ffffffff80196cca>] __alloc_pages_bulk+0x3b2/0x43a
[<ffffffff8018ee82>] __vmalloc_node_range+0x196/0x6ba
[<ffffffff80011904>] copy_process+0x72c/0x17ec
[<ffffffff80012ab4>] kernel_clone+0x60/0x2fe
[<ffffffff80012f62>] kernel_thread+0x82/0xa0
[<ffffffff8003552c>] kthreadd+0x14a/0x1be
[<ffffffff809357de>] ret_from_fork+0xe/0x1c

Rewrite this function with apply_to_existing_page_range(). It is fine to
not have any locking, because __kernel_map_pages() works with pages being
allocated/deallocated and those pages are not changed by anyone else in the
meantime.

Fixes: 5fde3db5eb02 ("riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC support")
Signed-off-by: Nam Cao <namcao@linutronix.de>
Cc: stable@vger.kernel.org
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/1289ecba9606a19917bc12b6c27da8aa23e1e5ae.1715750938.git.namcao@linutronix.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
/linux-master/drivers/gpu/drm/msm/dp/
H A Ddp_panel.cdiff 0e7f2705 Tue Dec 27 10:45:03 MST 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: add support of max dp link rate

By default, HBR2 (5.4G) is the max link rate be supported. This patch
uses the actual limit specified by DT and removes the artificial
limitation to 5.4 Gbps. Supporting HBR3 is a consequence of that.

Changes in v2:
-- add max link rate from dtsi

Changes in v3:
-- parser max_data_lanes and max_dp_link_rate from dp_out endpoint

Changes in v4:
-- delete unnecessary pr_err

Changes in v5:
-- split parser function into different patch

Changes in v9:
-- revised commit test

Changes in v13:
-- repalced "properity" with "property"

Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Patchwork: https://patchwork.freedesktop.org/patch/516097/
Link: https://lore.kernel.org/r/1672163103-31254-6-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
diff 3f65b1e2 Tue Apr 26 15:12:14 MDT 2022 Kuogee Hsieh <quic_khsieh@quicinc.com> drm/msm/dp: remove fail safe mode related code

Current DP driver implementation has adding safe mode done at
dp_hpd_plug_handle() which is expected to be executed under event
thread context.

However there is possible circular locking happen (see blow stack trace)
after edp driver call dp_hpd_plug_handle() from dp_bridge_enable() which
is executed under drm_thread context.

After review all possibilities methods and as discussed on
https://patchwork.freedesktop.org/patch/483155/, supporting EDID
compliance tests in the driver is quite hacky. As seen with other
vendor drivers, supporting these will be much easier with IGT. Hence
removing all the related fail safe code for it so that no possibility
of circular lock will happen.
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

======================================================
WARNING: possible circular locking dependency detected
5.15.35-lockdep #6 Tainted: G W
------------------------------------------------------
frecon/429 is trying to acquire lock:
ffffff808dc3c4e8 (&dev->mode_config.mutex){+.+.}-{3:3}, at:
dp_panel_add_fail_safe_mode+0x4c/0xa0

but task is already holding lock:
ffffff808dc441e0 (&kms->commit_lock[i]){+.+.}-{3:3}, at: lock_crtcs+0xb4/0x124

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&kms->commit_lock[i]){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
lock_crtcs+0xb4/0x124
msm_atomic_commit_tail+0x330/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #2 (crtc_ww_class_mutex){+.+.}-{3:3}:
__mutex_lock_common+0x174/0x1a64
ww_mutex_lock+0xb8/0x278
modeset_lock+0x304/0x4ac
drm_modeset_lock+0x4c/0x7c
drmm_mode_config_init+0x4a8/0xc50
msm_drm_init+0x274/0xac0
msm_drm_bind+0x20/0x2c
try_to_bring_up_master+0x3dc/0x470
__component_add+0x18c/0x3c0
component_add+0x1c/0x28
dp_display_probe+0x954/0xa98
platform_probe+0x124/0x15c
really_probe+0x1b0/0x5f8
__driver_probe_device+0x174/0x20c
driver_probe_device+0x70/0x134
__device_attach_driver+0x130/0x1d0
bus_for_each_drv+0xfc/0x14c
__device_attach+0x1bc/0x2bc
device_initial_probe+0x1c/0x28
bus_probe_device+0x94/0x178
deferred_probe_work_func+0x1a4/0x1f0
process_one_work+0x5d4/0x9dc
worker_thread+0x898/0xccc
kthread+0x2d4/0x3d4
ret_from_fork+0x10/0x20

-> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
ww_acquire_init+0x1c4/0x2c8
drm_modeset_acquire_init+0x44/0xc8
drm_helper_probe_single_connector_modes+0xb0/0x12dc
drm_mode_getconnector+0x5dc/0xfe8
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

-> #0 (&dev->mode_config.mutex){+.+.}-{3:3}:
__lock_acquire+0x2650/0x672c
lock_acquire+0x1b4/0x4ac
__mutex_lock_common+0x174/0x1a64
mutex_lock_nested+0x98/0xac
dp_panel_add_fail_safe_mode+0x4c/0xa0
dp_hpd_plug_handle+0x1f0/0x280
dp_bridge_enable+0x94/0x2b8
drm_atomic_bridge_chain_enable+0x11c/0x168
drm_atomic_helper_commit_modeset_enables+0x500/0x740
msm_atomic_commit_tail+0x3e4/0x748
commit_tail+0x19c/0x278
drm_atomic_helper_commit+0x1dc/0x1f0
drm_atomic_commit+0xc0/0xd8
drm_atomic_helper_set_config+0xb4/0x134
drm_mode_setcrtc+0x688/0x1248
drm_ioctl_kernel+0x1e4/0x338
drm_ioctl+0x3a4/0x684
__arm64_sys_ioctl+0x118/0x154
invoke_syscall+0x78/0x224
el0_svc_common+0x178/0x200
do_el0_svc+0x94/0x13c
el0_svc+0x5c/0xec
el0t_64_sync_handler+0x78/0x108
el0t_64_sync+0x1a4/0x1a8

Changes in v2:
-- re text commit title
-- remove all fail safe mode

Changes in v3:
-- remove dp_panel_add_fail_safe_mode() from dp_panel.h
-- add Fixes

Changes in v5:
-- to=dianders@chromium.org

Changes in v6:
-- fix Fixes commit ID

Fixes: 8b2c181e3dcf ("drm/msm/dp: add fail safe mode outside of event_mutex context")
Reported-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com>
Link: https://lore.kernel.org/r/1651007534-31842-1-git-send-email-quic_khsieh@quicinc.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
/linux-master/net/sched/
H A Dsch_frag.cdiff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 31fe34a0 Wed Apr 28 07:23:14 MDT 2021 Davide Caratti <dcaratti@redhat.com> net/sched: sch_frag: fix stack OOB read while fragmenting IPv4 packets

when 'act_mirred' tries to fragment IPv4 packets that had been previously
re-assembled using 'act_ct', splats like the following can be observed on
kernels built with KASAN:

BUG: KASAN: stack-out-of-bounds in ip_do_fragment+0x1b03/0x1f60
Read of size 1 at addr ffff888147009574 by task ping/947

CPU: 0 PID: 947 Comm: ping Not tainted 5.12.0-rc6+ #418
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
Call Trace:
<IRQ>
dump_stack+0x92/0xc1
print_address_description.constprop.7+0x1a/0x150
kasan_report.cold.13+0x7f/0x111
ip_do_fragment+0x1b03/0x1f60
sch_fragment+0x4bf/0xe40
tcf_mirred_act+0xc3d/0x11a0 [act_mirred]
tcf_action_exec+0x104/0x3e0
fl_classify+0x49a/0x5e0 [cls_flower]
tcf_classify_ingress+0x18a/0x820
__netif_receive_skb_core+0xae7/0x3340
__netif_receive_skb_one_core+0xb6/0x1b0
process_backlog+0x1ef/0x6c0
__napi_poll+0xaa/0x500
net_rx_action+0x702/0xac0
__do_softirq+0x1e4/0x97f
do_softirq+0x71/0x90
</IRQ>
__local_bh_enable_ip+0xdb/0xf0
ip_finish_output2+0x760/0x2120
ip_do_fragment+0x15a5/0x1f60
__ip_finish_output+0x4c2/0xea0
ip_output+0x1ca/0x4d0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x1c4b/0x2d00
sock_sendmsg+0xdb/0x110
__sys_sendto+0x1d7/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f82e13853eb
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 75 42 2c 00 41 89 ca 8b 00 85 c0 75 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 41 57 4d 89 c7 41 56 41 89
RSP: 002b:00007ffe01fad888 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00005571aac13700 RCX: 00007f82e13853eb
RDX: 0000000000002330 RSI: 00005571aac13700 RDI: 0000000000000003
RBP: 0000000000002330 R08: 00005571aac10500 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe01faefb0
R13: 00007ffe01fad890 R14: 00007ffe01fad980 R15: 00005571aac0f0a0

The buggy address belongs to the page:
page:000000001dff2e03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x147009
flags: 0x17ffffc0001000(reserved)
raw: 0017ffffc0001000 ffffea00051c0248 ffffea00051c0248 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888147009400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009480: f1 f1 f1 f1 04 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
>ffff888147009500: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
^
ffff888147009580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888147009600: 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 f2

for IPv4 packets, sch_fragment() uses a temporary struct dst_entry. Then,
in the following call graph:

ip_do_fragment()
ip_skb_dst_mtu()
ip_dst_mtu_maybe_forward()
ip_mtu_locked()

the pointer to struct dst_entry is used as pointer to struct rtable: this
turns the access to struct members like rt_mtu_locked into an OOB read in
the stack. Fix this changing the temporary variable used for IPv4 packets
in sch_fragment(), similarly to what is done for IPv6 few lines below.

Fixes: c129412f74e9 ("net/sched: sch_frag: add generic packet fragment support.")
Cc: <stable@vger.kernel.org> # 5.11
Reported-by: Shuang Li <shuali@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
/linux-master/drivers/net/ethernet/mellanox/mlx5/core/lag/
H A Dmp.cdiff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
diff 27b0420f Mon Apr 18 08:32:19 MDT 2022 Vlad Buslov <vladbu@nvidia.com> net/mlx5e: Lag, Fix use-after-free in fib event handler

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[ 203.588029] ==================================================================
[ 203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[ 203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G B 5.17.0-rc7+ #6
[ 203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[ 203.600053] Call Trace:
[ 203.600608] <TASK>
[ 203.601110] dump_stack_lvl+0x48/0x5e
[ 203.601860] print_address_description.constprop.0+0x1f/0x160
[ 203.602950] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.604073] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.605177] kasan_report.cold+0x83/0xdf
[ 203.605969] ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.607102] mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[ 203.608199] ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[ 203.609382] ? read_word_at_a_time+0xe/0x20
[ 203.610463] ? strscpy+0xa0/0x2a0
[ 203.611463] process_one_work+0x722/0x1270
[ 203.612344] worker_thread+0x540/0x11e0
[ 203.613136] ? rescuer_thread+0xd50/0xd50
[ 203.613949] kthread+0x26e/0x300
[ 203.614627] ? kthread_complete_and_exit+0x20/0x20
[ 203.615542] ret_from_fork+0x1f/0x30
[ 203.616273] </TASK>

[ 203.617174] Allocated by task 3746:
[ 203.617874] kasan_save_stack+0x1e/0x40
[ 203.618644] __kasan_kmalloc+0x81/0xa0
[ 203.619394] fib_create_info+0xb41/0x3c50
[ 203.620213] fib_table_insert+0x190/0x1ff0
[ 203.621020] fib_magic.isra.0+0x246/0x2e0
[ 203.621803] fib_add_ifaddr+0x19f/0x670
[ 203.622563] fib_inetaddr_event+0x13f/0x270
[ 203.623377] blocking_notifier_call_chain+0xd4/0x130
[ 203.624355] __inet_insert_ifa+0x641/0xb20
[ 203.625185] inet_rtm_newaddr+0xc3d/0x16a0
[ 203.626009] rtnetlink_rcv_msg+0x309/0x880
[ 203.626826] netlink_rcv_skb+0x11d/0x340
[ 203.627626] netlink_unicast+0x4cc/0x790
[ 203.628430] netlink_sendmsg+0x762/0xc00
[ 203.629230] sock_sendmsg+0xb2/0xe0
[ 203.629955] ____sys_sendmsg+0x58a/0x770
[ 203.630756] ___sys_sendmsg+0xd8/0x160
[ 203.631523] __sys_sendmsg+0xb7/0x140
[ 203.632294] do_syscall_64+0x35/0x80
[ 203.633045] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.634427] Freed by task 0:
[ 203.635063] kasan_save_stack+0x1e/0x40
[ 203.635844] kasan_set_track+0x21/0x30
[ 203.636618] kasan_set_free_info+0x20/0x30
[ 203.637450] __kasan_slab_free+0xfc/0x140
[ 203.638271] kfree+0x94/0x3b0
[ 203.638903] rcu_core+0x5e4/0x1990
[ 203.639640] __do_softirq+0x1ba/0x5d3

[ 203.640828] Last potentially related work creation:
[ 203.641785] kasan_save_stack+0x1e/0x40
[ 203.642571] __kasan_record_aux_stack+0x9f/0xb0
[ 203.643478] call_rcu+0x88/0x9c0
[ 203.644178] fib_release_info+0x539/0x750
[ 203.644997] fib_table_delete+0x659/0xb80
[ 203.645809] fib_magic.isra.0+0x1a3/0x2e0
[ 203.646617] fib_del_ifaddr+0x93f/0x1300
[ 203.647415] fib_inetaddr_event+0x9f/0x270
[ 203.648251] blocking_notifier_call_chain+0xd4/0x130
[ 203.649225] __inet_del_ifa+0x474/0xc10
[ 203.650016] devinet_ioctl+0x781/0x17f0
[ 203.650788] inet_ioctl+0x1ad/0x290
[ 203.651533] sock_do_ioctl+0xce/0x1c0
[ 203.652315] sock_ioctl+0x27b/0x4f0
[ 203.653058] __x64_sys_ioctl+0x124/0x190
[ 203.653850] do_syscall_64+0x35/0x80
[ 203.654608] entry_SYSCALL_64_after_hwframe+0x44/0xae

[ 203.666952] The buggy address belongs to the object at ffff888144df2000
which belongs to the cache kmalloc-256 of size 256
[ 203.669250] The buggy address is located 80 bytes inside of
256-byte region [ffff888144df2000, ffff888144df2100)
[ 203.671332] The buggy address belongs to the page:
[ 203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[ 203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[ 203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[ 203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[ 203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 203.679928] page dumped because: kasan: bad access detected

[ 203.681455] Memory state around the buggy address:
[ 203.682421] ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.683863] ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.686701] ^
[ 203.687820] ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 203.689226] ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 203.690620] ==================================================================

Fixes: ad11c4f1d8fd ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
/linux-master/drivers/leds/
H A Dled-core.cdiff 49e50aad Mon Oct 16 09:30:51 MDT 2023 Andy Shevchenko <andriy.shevchenko@linux.intel.com> leds: core: Refactor led_update_brightness() to use standard pattern

The standard conditional pattern is to check for errors first and
bail out if any. Refactor led_update_brightness() accordingly.

While at it, drop unneeded assignment and return 0 unconditionally
on success.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Denis Osterland-Heim <denis.osterland@diehl.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20231016153051.1409074-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
diff 22720a87 Wed May 10 10:22:33 MDT 2023 Hans de Goede <hdegoede@redhat.com> leds: Fix oops about sleeping in led_trigger_blink()

led_trigger_blink() calls led_blink_set() from a RCU read-side critical
section so led_blink_set() must not sleep. Note sleeping was not allowed
before the switch to RCU either because a spinlock was held before.

led_blink_set() does not sleep when sw-blinking is used, but
many LED controller drivers with hw blink support have a blink_set
function which may sleep, leading to an oops like this one:

[ 832.605062] ------------[ cut here ]------------
[ 832.605085] Voluntary context switch within RCU read-side critical section!
[ 832.605119] WARNING: CPU: 2 PID: 370 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4ee/0x690
<snip>
[ 832.606453] Call Trace:
[ 832.606466] <TASK>
[ 832.606487] __schedule+0x9f/0x1480
[ 832.606527] schedule+0x5d/0xe0
[ 832.606549] schedule_timeout+0x79/0x140
[ 832.606572] ? __pfx_process_timeout+0x10/0x10
[ 832.606599] wait_for_completion_timeout+0x6f/0x140
[ 832.606627] i2c_dw_xfer+0x101/0x460
[ 832.606659] ? psi_group_change+0x168/0x400
[ 832.606680] __i2c_transfer+0x172/0x6d0
[ 832.606709] i2c_smbus_xfer_emulated+0x27d/0x9c0
[ 832.606732] ? __schedule+0x430/0x1480
[ 832.606753] ? preempt_count_add+0x6a/0xa0
[ 832.606778] ? get_nohz_timer_target+0x18/0x190
[ 832.606796] ? lock_timer_base+0x61/0x80
[ 832.606817] ? preempt_count_add+0x6a/0xa0
[ 832.606842] __i2c_smbus_xfer+0xa2/0x3f0
[ 832.606862] i2c_smbus_xfer+0x66/0xf0
[ 832.606882] i2c_smbus_read_byte_data+0x41/0x70
[ 832.606901] ? _raw_spin_unlock_irqrestore+0x23/0x40
[ 832.606922] ? __pm_runtime_suspend+0x46/0xc0
[ 832.606946] cht_wc_byte_reg_read+0x2e/0x60
[ 832.606972] _regmap_read+0x5c/0x120
[ 832.606997] _regmap_update_bits+0x96/0xc0
[ 832.607023] regmap_update_bits_base+0x5b/0x90
[ 832.607053] cht_wc_leds_brightness_get+0x412/0x910 [leds_cht_wcove]
[ 832.607094] led_blink_setup+0x28/0x100
[ 832.607119] led_trigger_blink+0x40/0x70
[ 832.607145] power_supply_update_leds+0x1b7/0x1c0
[ 832.607174] power_supply_changed_work+0x67/0xe0
[ 832.607198] process_one_work+0x1c8/0x3c0
[ 832.607222] worker_thread+0x4d/0x380
[ 832.607243] ? __pfx_worker_thread+0x10/0x10
[ 832.607258] kthread+0xe9/0x110
[ 832.607279] ? __pfx_kthread+0x10/0x10
[ 832.607300] ret_from_fork+0x2c/0x50
[ 832.607337] </TASK>
[ 832.607344] ---[ end trace 0000000000000000 ]---

Add a new led_blink_set_nosleep() function which defers the actual
led_blink_set() call to a workqueue when necessary to fix this.

This also fixes an existing race where a pending led_set_brightness() has
been deferred to set_brightness_work and might then race with a later
led_cdev->blink_set() call. Note this race is only an issue with triggers
mixing led_trigger_event() and led_trigger_blink() calls, sysfs API
calls and led_trigger_blink_oneshot() are not affected.

Note rather then adding a separate blink_set_blocking callback this uses
the presence of the already existing brightness_set_blocking callback to
detect if the blinking call should be deferred to set_brightness_work.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Tested-by: Yauhen Kharuzhy <jekhor@gmail.com>
Link: https://lore.kernel.org/r/20230510162234.291439-4-hdegoede@redhat.com
Signed-off-by: Lee Jones <lee@kernel.org>
/linux-master/drivers/gpu/drm/amd/include/
H A Dvangogh_ip_offset.hdiff 6cda1dbc Mon Nov 23 04:19:08 MST 2020 Lee Jones <lee.jones@linaro.org> drm/amd/include/vangogh_ip_offset: Mark top-level IP_BASE as __maybe_unused

Fixes the following W=1 kernel build warning(s):

In file included from drivers/gpu/drm/amd/amdgpu/vangogh_reg_init.c:28:
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:210:29: warning: ‘USB_BASE’ defined but not used [-Wunused-const-variable=]
210 | static const struct IP_BASE USB_BASE = { { { { 0x0242A800, 0x05B00000, 0, 0, 0, 0 } },
| ^~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:202:29: warning: ‘UMC_BASE’ defined but not used [-Wunused-const-variable=]
202 | static const struct IP_BASE UMC_BASE = { { { { 0x00014000, 0x02425800, 0, 0, 0, 0 } },
| ^~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:178:29: warning: ‘PCIE0_BASE’ defined but not used [-Wunused-const-variable=]
178 | static const struct IP_BASE PCIE0_BASE = { { { { 0x00000000, 0x00000014, 0x00000D20, 0x00010400, 0x0241B000, 0x04040000 } },
| ^~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:154:29: warning: ‘MP2_BASE’ defined but not used [-Wunused-const-variable=]
154 | static const struct IP_BASE MP2_BASE = { { { { 0x00016400, 0x02400800, 0x00F40000, 0x00F80000, 0x00FC0000, 0 } },
| ^~~~~~~~

NB: Snipped lots of these

Acked-by: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Huang Rui <ray.huang@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
diff 6cda1dbc Mon Nov 23 04:19:08 MST 2020 Lee Jones <lee.jones@linaro.org> drm/amd/include/vangogh_ip_offset: Mark top-level IP_BASE as __maybe_unused

Fixes the following W=1 kernel build warning(s):

In file included from drivers/gpu/drm/amd/amdgpu/vangogh_reg_init.c:28:
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:210:29: warning: ‘USB_BASE’ defined but not used [-Wunused-const-variable=]
210 | static const struct IP_BASE USB_BASE = { { { { 0x0242A800, 0x05B00000, 0, 0, 0, 0 } },
| ^~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:202:29: warning: ‘UMC_BASE’ defined but not used [-Wunused-const-variable=]
202 | static const struct IP_BASE UMC_BASE = { { { { 0x00014000, 0x02425800, 0, 0, 0, 0 } },
| ^~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:178:29: warning: ‘PCIE0_BASE’ defined but not used [-Wunused-const-variable=]
178 | static const struct IP_BASE PCIE0_BASE = { { { { 0x00000000, 0x00000014, 0x00000D20, 0x00010400, 0x0241B000, 0x04040000 } },
| ^~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:154:29: warning: ‘MP2_BASE’ defined but not used [-Wunused-const-variable=]
154 | static const struct IP_BASE MP2_BASE = { { { { 0x00016400, 0x02400800, 0x00F40000, 0x00F80000, 0x00FC0000, 0 } },
| ^~~~~~~~

NB: Snipped lots of these

Acked-by: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Huang Rui <ray.huang@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
diff 6cda1dbc Mon Nov 23 04:19:08 MST 2020 Lee Jones <lee.jones@linaro.org> drm/amd/include/vangogh_ip_offset: Mark top-level IP_BASE as __maybe_unused

Fixes the following W=1 kernel build warning(s):

In file included from drivers/gpu/drm/amd/amdgpu/vangogh_reg_init.c:28:
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:210:29: warning: ‘USB_BASE’ defined but not used [-Wunused-const-variable=]
210 | static const struct IP_BASE USB_BASE = { { { { 0x0242A800, 0x05B00000, 0, 0, 0, 0 } },
| ^~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:202:29: warning: ‘UMC_BASE’ defined but not used [-Wunused-const-variable=]
202 | static const struct IP_BASE UMC_BASE = { { { { 0x00014000, 0x02425800, 0, 0, 0, 0 } },
| ^~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:178:29: warning: ‘PCIE0_BASE’ defined but not used [-Wunused-const-variable=]
178 | static const struct IP_BASE PCIE0_BASE = { { { { 0x00000000, 0x00000014, 0x00000D20, 0x00010400, 0x0241B000, 0x04040000 } },
| ^~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:154:29: warning: ‘MP2_BASE’ defined but not used [-Wunused-const-variable=]
154 | static const struct IP_BASE MP2_BASE = { { { { 0x00016400, 0x02400800, 0x00F40000, 0x00F80000, 0x00FC0000, 0 } },
| ^~~~~~~~

NB: Snipped lots of these

Acked-by: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Huang Rui <ray.huang@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
diff 6cda1dbc Mon Nov 23 04:19:08 MST 2020 Lee Jones <lee.jones@linaro.org> drm/amd/include/vangogh_ip_offset: Mark top-level IP_BASE as __maybe_unused

Fixes the following W=1 kernel build warning(s):

In file included from drivers/gpu/drm/amd/amdgpu/vangogh_reg_init.c:28:
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:210:29: warning: ‘USB_BASE’ defined but not used [-Wunused-const-variable=]
210 | static const struct IP_BASE USB_BASE = { { { { 0x0242A800, 0x05B00000, 0, 0, 0, 0 } },
| ^~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:202:29: warning: ‘UMC_BASE’ defined but not used [-Wunused-const-variable=]
202 | static const struct IP_BASE UMC_BASE = { { { { 0x00014000, 0x02425800, 0, 0, 0, 0 } },
| ^~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:178:29: warning: ‘PCIE0_BASE’ defined but not used [-Wunused-const-variable=]
178 | static const struct IP_BASE PCIE0_BASE = { { { { 0x00000000, 0x00000014, 0x00000D20, 0x00010400, 0x0241B000, 0x04040000 } },
| ^~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:154:29: warning: ‘MP2_BASE’ defined but not used [-Wunused-const-variable=]
154 | static const struct IP_BASE MP2_BASE = { { { { 0x00016400, 0x02400800, 0x00F40000, 0x00F80000, 0x00FC0000, 0 } },
| ^~~~~~~~

NB: Snipped lots of these

Acked-by: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Huang Rui <ray.huang@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
diff 6cda1dbc Mon Nov 23 04:19:08 MST 2020 Lee Jones <lee.jones@linaro.org> drm/amd/include/vangogh_ip_offset: Mark top-level IP_BASE as __maybe_unused

Fixes the following W=1 kernel build warning(s):

In file included from drivers/gpu/drm/amd/amdgpu/vangogh_reg_init.c:28:
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:210:29: warning: ‘USB_BASE’ defined but not used [-Wunused-const-variable=]
210 | static const struct IP_BASE USB_BASE = { { { { 0x0242A800, 0x05B00000, 0, 0, 0, 0 } },
| ^~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:202:29: warning: ‘UMC_BASE’ defined but not used [-Wunused-const-variable=]
202 | static const struct IP_BASE UMC_BASE = { { { { 0x00014000, 0x02425800, 0, 0, 0, 0 } },
| ^~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:178:29: warning: ‘PCIE0_BASE’ defined but not used [-Wunused-const-variable=]
178 | static const struct IP_BASE PCIE0_BASE = { { { { 0x00000000, 0x00000014, 0x00000D20, 0x00010400, 0x0241B000, 0x04040000 } },
| ^~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:154:29: warning: ‘MP2_BASE’ defined but not used [-Wunused-const-variable=]
154 | static const struct IP_BASE MP2_BASE = { { { { 0x00016400, 0x02400800, 0x00F40000, 0x00F80000, 0x00FC0000, 0 } },
| ^~~~~~~~

NB: Snipped lots of these

Acked-by: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Huang Rui <ray.huang@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
diff 6cda1dbc Mon Nov 23 04:19:08 MST 2020 Lee Jones <lee.jones@linaro.org> drm/amd/include/vangogh_ip_offset: Mark top-level IP_BASE as __maybe_unused

Fixes the following W=1 kernel build warning(s):

In file included from drivers/gpu/drm/amd/amdgpu/vangogh_reg_init.c:28:
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:210:29: warning: ‘USB_BASE’ defined but not used [-Wunused-const-variable=]
210 | static const struct IP_BASE USB_BASE = { { { { 0x0242A800, 0x05B00000, 0, 0, 0, 0 } },
| ^~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:202:29: warning: ‘UMC_BASE’ defined but not used [-Wunused-const-variable=]
202 | static const struct IP_BASE UMC_BASE = { { { { 0x00014000, 0x02425800, 0, 0, 0, 0 } },
| ^~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:178:29: warning: ‘PCIE0_BASE’ defined but not used [-Wunused-const-variable=]
178 | static const struct IP_BASE PCIE0_BASE = { { { { 0x00000000, 0x00000014, 0x00000D20, 0x00010400, 0x0241B000, 0x04040000 } },
| ^~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:154:29: warning: ‘MP2_BASE’ defined but not used [-Wunused-const-variable=]
154 | static const struct IP_BASE MP2_BASE = { { { { 0x00016400, 0x02400800, 0x00F40000, 0x00F80000, 0x00FC0000, 0 } },
| ^~~~~~~~

NB: Snipped lots of these

Acked-by: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Huang Rui <ray.huang@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
diff 6cda1dbc Mon Nov 23 04:19:08 MST 2020 Lee Jones <lee.jones@linaro.org> drm/amd/include/vangogh_ip_offset: Mark top-level IP_BASE as __maybe_unused

Fixes the following W=1 kernel build warning(s):

In file included from drivers/gpu/drm/amd/amdgpu/vangogh_reg_init.c:28:
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:210:29: warning: ‘USB_BASE’ defined but not used [-Wunused-const-variable=]
210 | static const struct IP_BASE USB_BASE = { { { { 0x0242A800, 0x05B00000, 0, 0, 0, 0 } },
| ^~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:202:29: warning: ‘UMC_BASE’ defined but not used [-Wunused-const-variable=]
202 | static const struct IP_BASE UMC_BASE = { { { { 0x00014000, 0x02425800, 0, 0, 0, 0 } },
| ^~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:178:29: warning: ‘PCIE0_BASE’ defined but not used [-Wunused-const-variable=]
178 | static const struct IP_BASE PCIE0_BASE = { { { { 0x00000000, 0x00000014, 0x00000D20, 0x00010400, 0x0241B000, 0x04040000 } },
| ^~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:154:29: warning: ‘MP2_BASE’ defined but not used [-Wunused-const-variable=]
154 | static const struct IP_BASE MP2_BASE = { { { { 0x00016400, 0x02400800, 0x00F40000, 0x00F80000, 0x00FC0000, 0 } },
| ^~~~~~~~

NB: Snipped lots of these

Acked-by: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Huang Rui <ray.huang@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
diff 6cda1dbc Mon Nov 23 04:19:08 MST 2020 Lee Jones <lee.jones@linaro.org> drm/amd/include/vangogh_ip_offset: Mark top-level IP_BASE as __maybe_unused

Fixes the following W=1 kernel build warning(s):

In file included from drivers/gpu/drm/amd/amdgpu/vangogh_reg_init.c:28:
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:210:29: warning: ‘USB_BASE’ defined but not used [-Wunused-const-variable=]
210 | static const struct IP_BASE USB_BASE = { { { { 0x0242A800, 0x05B00000, 0, 0, 0, 0 } },
| ^~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:202:29: warning: ‘UMC_BASE’ defined but not used [-Wunused-const-variable=]
202 | static const struct IP_BASE UMC_BASE = { { { { 0x00014000, 0x02425800, 0, 0, 0, 0 } },
| ^~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:178:29: warning: ‘PCIE0_BASE’ defined but not used [-Wunused-const-variable=]
178 | static const struct IP_BASE PCIE0_BASE = { { { { 0x00000000, 0x00000014, 0x00000D20, 0x00010400, 0x0241B000, 0x04040000 } },
| ^~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:154:29: warning: ‘MP2_BASE’ defined but not used [-Wunused-const-variable=]
154 | static const struct IP_BASE MP2_BASE = { { { { 0x00016400, 0x02400800, 0x00F40000, 0x00F80000, 0x00FC0000, 0 } },
| ^~~~~~~~

NB: Snipped lots of these

Acked-by: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Huang Rui <ray.huang@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
diff 6cda1dbc Mon Nov 23 04:19:08 MST 2020 Lee Jones <lee.jones@linaro.org> drm/amd/include/vangogh_ip_offset: Mark top-level IP_BASE as __maybe_unused

Fixes the following W=1 kernel build warning(s):

In file included from drivers/gpu/drm/amd/amdgpu/vangogh_reg_init.c:28:
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:210:29: warning: ‘USB_BASE’ defined but not used [-Wunused-const-variable=]
210 | static const struct IP_BASE USB_BASE = { { { { 0x0242A800, 0x05B00000, 0, 0, 0, 0 } },
| ^~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:202:29: warning: ‘UMC_BASE’ defined but not used [-Wunused-const-variable=]
202 | static const struct IP_BASE UMC_BASE = { { { { 0x00014000, 0x02425800, 0, 0, 0, 0 } },
| ^~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:178:29: warning: ‘PCIE0_BASE’ defined but not used [-Wunused-const-variable=]
178 | static const struct IP_BASE PCIE0_BASE = { { { { 0x00000000, 0x00000014, 0x00000D20, 0x00010400, 0x0241B000, 0x04040000 } },
| ^~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:154:29: warning: ‘MP2_BASE’ defined but not used [-Wunused-const-variable=]
154 | static const struct IP_BASE MP2_BASE = { { { { 0x00016400, 0x02400800, 0x00F40000, 0x00F80000, 0x00FC0000, 0 } },
| ^~~~~~~~

NB: Snipped lots of these

Acked-by: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Huang Rui <ray.huang@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
diff 6cda1dbc Mon Nov 23 04:19:08 MST 2020 Lee Jones <lee.jones@linaro.org> drm/amd/include/vangogh_ip_offset: Mark top-level IP_BASE as __maybe_unused

Fixes the following W=1 kernel build warning(s):

In file included from drivers/gpu/drm/amd/amdgpu/vangogh_reg_init.c:28:
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:210:29: warning: ‘USB_BASE’ defined but not used [-Wunused-const-variable=]
210 | static const struct IP_BASE USB_BASE = { { { { 0x0242A800, 0x05B00000, 0, 0, 0, 0 } },
| ^~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:202:29: warning: ‘UMC_BASE’ defined but not used [-Wunused-const-variable=]
202 | static const struct IP_BASE UMC_BASE = { { { { 0x00014000, 0x02425800, 0, 0, 0, 0 } },
| ^~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:178:29: warning: ‘PCIE0_BASE’ defined but not used [-Wunused-const-variable=]
178 | static const struct IP_BASE PCIE0_BASE = { { { { 0x00000000, 0x00000014, 0x00000D20, 0x00010400, 0x0241B000, 0x04040000 } },
| ^~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:154:29: warning: ‘MP2_BASE’ defined but not used [-Wunused-const-variable=]
154 | static const struct IP_BASE MP2_BASE = { { { { 0x00016400, 0x02400800, 0x00F40000, 0x00F80000, 0x00FC0000, 0 } },
| ^~~~~~~~

NB: Snipped lots of these

Acked-by: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Huang Rui <ray.huang@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
diff 6cda1dbc Mon Nov 23 04:19:08 MST 2020 Lee Jones <lee.jones@linaro.org> drm/amd/include/vangogh_ip_offset: Mark top-level IP_BASE as __maybe_unused

Fixes the following W=1 kernel build warning(s):

In file included from drivers/gpu/drm/amd/amdgpu/vangogh_reg_init.c:28:
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:210:29: warning: ‘USB_BASE’ defined but not used [-Wunused-const-variable=]
210 | static const struct IP_BASE USB_BASE = { { { { 0x0242A800, 0x05B00000, 0, 0, 0, 0 } },
| ^~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:202:29: warning: ‘UMC_BASE’ defined but not used [-Wunused-const-variable=]
202 | static const struct IP_BASE UMC_BASE = { { { { 0x00014000, 0x02425800, 0, 0, 0, 0 } },
| ^~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:178:29: warning: ‘PCIE0_BASE’ defined but not used [-Wunused-const-variable=]
178 | static const struct IP_BASE PCIE0_BASE = { { { { 0x00000000, 0x00000014, 0x00000D20, 0x00010400, 0x0241B000, 0x04040000 } },
| ^~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:154:29: warning: ‘MP2_BASE’ defined but not used [-Wunused-const-variable=]
154 | static const struct IP_BASE MP2_BASE = { { { { 0x00016400, 0x02400800, 0x00F40000, 0x00F80000, 0x00FC0000, 0 } },
| ^~~~~~~~

NB: Snipped lots of these

Acked-by: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Huang Rui <ray.huang@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
diff 6cda1dbc Mon Nov 23 04:19:08 MST 2020 Lee Jones <lee.jones@linaro.org> drm/amd/include/vangogh_ip_offset: Mark top-level IP_BASE as __maybe_unused

Fixes the following W=1 kernel build warning(s):

In file included from drivers/gpu/drm/amd/amdgpu/vangogh_reg_init.c:28:
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:210:29: warning: ‘USB_BASE’ defined but not used [-Wunused-const-variable=]
210 | static const struct IP_BASE USB_BASE = { { { { 0x0242A800, 0x05B00000, 0, 0, 0, 0 } },
| ^~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:202:29: warning: ‘UMC_BASE’ defined but not used [-Wunused-const-variable=]
202 | static const struct IP_BASE UMC_BASE = { { { { 0x00014000, 0x02425800, 0, 0, 0, 0 } },
| ^~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:178:29: warning: ‘PCIE0_BASE’ defined but not used [-Wunused-const-variable=]
178 | static const struct IP_BASE PCIE0_BASE = { { { { 0x00000000, 0x00000014, 0x00000D20, 0x00010400, 0x0241B000, 0x04040000 } },
| ^~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:154:29: warning: ‘MP2_BASE’ defined but not used [-Wunused-const-variable=]
154 | static const struct IP_BASE MP2_BASE = { { { { 0x00016400, 0x02400800, 0x00F40000, 0x00F80000, 0x00FC0000, 0 } },
| ^~~~~~~~

NB: Snipped lots of these

Acked-by: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Huang Rui <ray.huang@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
diff 6cda1dbc Mon Nov 23 04:19:08 MST 2020 Lee Jones <lee.jones@linaro.org> drm/amd/include/vangogh_ip_offset: Mark top-level IP_BASE as __maybe_unused

Fixes the following W=1 kernel build warning(s):

In file included from drivers/gpu/drm/amd/amdgpu/vangogh_reg_init.c:28:
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:210:29: warning: ‘USB_BASE’ defined but not used [-Wunused-const-variable=]
210 | static const struct IP_BASE USB_BASE = { { { { 0x0242A800, 0x05B00000, 0, 0, 0, 0 } },
| ^~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:202:29: warning: ‘UMC_BASE’ defined but not used [-Wunused-const-variable=]
202 | static const struct IP_BASE UMC_BASE = { { { { 0x00014000, 0x02425800, 0, 0, 0, 0 } },
| ^~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:178:29: warning: ‘PCIE0_BASE’ defined but not used [-Wunused-const-variable=]
178 | static const struct IP_BASE PCIE0_BASE = { { { { 0x00000000, 0x00000014, 0x00000D20, 0x00010400, 0x0241B000, 0x04040000 } },
| ^~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:154:29: warning: ‘MP2_BASE’ defined but not used [-Wunused-const-variable=]
154 | static const struct IP_BASE MP2_BASE = { { { { 0x00016400, 0x02400800, 0x00F40000, 0x00F80000, 0x00FC0000, 0 } },
| ^~~~~~~~

NB: Snipped lots of these

Acked-by: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Huang Rui <ray.huang@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
diff 6cda1dbc Mon Nov 23 04:19:08 MST 2020 Lee Jones <lee.jones@linaro.org> drm/amd/include/vangogh_ip_offset: Mark top-level IP_BASE as __maybe_unused

Fixes the following W=1 kernel build warning(s):

In file included from drivers/gpu/drm/amd/amdgpu/vangogh_reg_init.c:28:
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:210:29: warning: ‘USB_BASE’ defined but not used [-Wunused-const-variable=]
210 | static const struct IP_BASE USB_BASE = { { { { 0x0242A800, 0x05B00000, 0, 0, 0, 0 } },
| ^~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:202:29: warning: ‘UMC_BASE’ defined but not used [-Wunused-const-variable=]
202 | static const struct IP_BASE UMC_BASE = { { { { 0x00014000, 0x02425800, 0, 0, 0, 0 } },
| ^~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:178:29: warning: ‘PCIE0_BASE’ defined but not used [-Wunused-const-variable=]
178 | static const struct IP_BASE PCIE0_BASE = { { { { 0x00000000, 0x00000014, 0x00000D20, 0x00010400, 0x0241B000, 0x04040000 } },
| ^~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:154:29: warning: ‘MP2_BASE’ defined but not used [-Wunused-const-variable=]
154 | static const struct IP_BASE MP2_BASE = { { { { 0x00016400, 0x02400800, 0x00F40000, 0x00F80000, 0x00FC0000, 0 } },
| ^~~~~~~~

NB: Snipped lots of these

Acked-by: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Huang Rui <ray.huang@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
diff 6cda1dbc Mon Nov 23 04:19:08 MST 2020 Lee Jones <lee.jones@linaro.org> drm/amd/include/vangogh_ip_offset: Mark top-level IP_BASE as __maybe_unused

Fixes the following W=1 kernel build warning(s):

In file included from drivers/gpu/drm/amd/amdgpu/vangogh_reg_init.c:28:
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:210:29: warning: ‘USB_BASE’ defined but not used [-Wunused-const-variable=]
210 | static const struct IP_BASE USB_BASE = { { { { 0x0242A800, 0x05B00000, 0, 0, 0, 0 } },
| ^~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:202:29: warning: ‘UMC_BASE’ defined but not used [-Wunused-const-variable=]
202 | static const struct IP_BASE UMC_BASE = { { { { 0x00014000, 0x02425800, 0, 0, 0, 0 } },
| ^~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:178:29: warning: ‘PCIE0_BASE’ defined but not used [-Wunused-const-variable=]
178 | static const struct IP_BASE PCIE0_BASE = { { { { 0x00000000, 0x00000014, 0x00000D20, 0x00010400, 0x0241B000, 0x04040000 } },
| ^~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:154:29: warning: ‘MP2_BASE’ defined but not used [-Wunused-const-variable=]
154 | static const struct IP_BASE MP2_BASE = { { { { 0x00016400, 0x02400800, 0x00F40000, 0x00F80000, 0x00FC0000, 0 } },
| ^~~~~~~~

NB: Snipped lots of these

Acked-by: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Huang Rui <ray.huang@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
diff 6cda1dbc Mon Nov 23 04:19:08 MST 2020 Lee Jones <lee.jones@linaro.org> drm/amd/include/vangogh_ip_offset: Mark top-level IP_BASE as __maybe_unused

Fixes the following W=1 kernel build warning(s):

In file included from drivers/gpu/drm/amd/amdgpu/vangogh_reg_init.c:28:
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:210:29: warning: ‘USB_BASE’ defined but not used [-Wunused-const-variable=]
210 | static const struct IP_BASE USB_BASE = { { { { 0x0242A800, 0x05B00000, 0, 0, 0, 0 } },
| ^~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:202:29: warning: ‘UMC_BASE’ defined but not used [-Wunused-const-variable=]
202 | static const struct IP_BASE UMC_BASE = { { { { 0x00014000, 0x02425800, 0, 0, 0, 0 } },
| ^~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:178:29: warning: ‘PCIE0_BASE’ defined but not used [-Wunused-const-variable=]
178 | static const struct IP_BASE PCIE0_BASE = { { { { 0x00000000, 0x00000014, 0x00000D20, 0x00010400, 0x0241B000, 0x04040000 } },
| ^~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:154:29: warning: ‘MP2_BASE’ defined but not used [-Wunused-const-variable=]
154 | static const struct IP_BASE MP2_BASE = { { { { 0x00016400, 0x02400800, 0x00F40000, 0x00F80000, 0x00FC0000, 0 } },
| ^~~~~~~~

NB: Snipped lots of these

Acked-by: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Huang Rui <ray.huang@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
diff 6cda1dbc Mon Nov 23 04:19:08 MST 2020 Lee Jones <lee.jones@linaro.org> drm/amd/include/vangogh_ip_offset: Mark top-level IP_BASE as __maybe_unused

Fixes the following W=1 kernel build warning(s):

In file included from drivers/gpu/drm/amd/amdgpu/vangogh_reg_init.c:28:
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:210:29: warning: ‘USB_BASE’ defined but not used [-Wunused-const-variable=]
210 | static const struct IP_BASE USB_BASE = { { { { 0x0242A800, 0x05B00000, 0, 0, 0, 0 } },
| ^~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:202:29: warning: ‘UMC_BASE’ defined but not used [-Wunused-const-variable=]
202 | static const struct IP_BASE UMC_BASE = { { { { 0x00014000, 0x02425800, 0, 0, 0, 0 } },
| ^~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:178:29: warning: ‘PCIE0_BASE’ defined but not used [-Wunused-const-variable=]
178 | static const struct IP_BASE PCIE0_BASE = { { { { 0x00000000, 0x00000014, 0x00000D20, 0x00010400, 0x0241B000, 0x04040000 } },
| ^~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:154:29: warning: ‘MP2_BASE’ defined but not used [-Wunused-const-variable=]
154 | static const struct IP_BASE MP2_BASE = { { { { 0x00016400, 0x02400800, 0x00F40000, 0x00F80000, 0x00FC0000, 0 } },
| ^~~~~~~~

NB: Snipped lots of these

Acked-by: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Huang Rui <ray.huang@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
diff 6cda1dbc Mon Nov 23 04:19:08 MST 2020 Lee Jones <lee.jones@linaro.org> drm/amd/include/vangogh_ip_offset: Mark top-level IP_BASE as __maybe_unused

Fixes the following W=1 kernel build warning(s):

In file included from drivers/gpu/drm/amd/amdgpu/vangogh_reg_init.c:28:
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:210:29: warning: ‘USB_BASE’ defined but not used [-Wunused-const-variable=]
210 | static const struct IP_BASE USB_BASE = { { { { 0x0242A800, 0x05B00000, 0, 0, 0, 0 } },
| ^~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:202:29: warning: ‘UMC_BASE’ defined but not used [-Wunused-const-variable=]
202 | static const struct IP_BASE UMC_BASE = { { { { 0x00014000, 0x02425800, 0, 0, 0, 0 } },
| ^~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:178:29: warning: ‘PCIE0_BASE’ defined but not used [-Wunused-const-variable=]
178 | static const struct IP_BASE PCIE0_BASE = { { { { 0x00000000, 0x00000014, 0x00000D20, 0x00010400, 0x0241B000, 0x04040000 } },
| ^~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:154:29: warning: ‘MP2_BASE’ defined but not used [-Wunused-const-variable=]
154 | static const struct IP_BASE MP2_BASE = { { { { 0x00016400, 0x02400800, 0x00F40000, 0x00F80000, 0x00FC0000, 0 } },
| ^~~~~~~~

NB: Snipped lots of these

Acked-by: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Huang Rui <ray.huang@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
diff 6cda1dbc Mon Nov 23 04:19:08 MST 2020 Lee Jones <lee.jones@linaro.org> drm/amd/include/vangogh_ip_offset: Mark top-level IP_BASE as __maybe_unused

Fixes the following W=1 kernel build warning(s):

In file included from drivers/gpu/drm/amd/amdgpu/vangogh_reg_init.c:28:
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:210:29: warning: ‘USB_BASE’ defined but not used [-Wunused-const-variable=]
210 | static const struct IP_BASE USB_BASE = { { { { 0x0242A800, 0x05B00000, 0, 0, 0, 0 } },
| ^~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:202:29: warning: ‘UMC_BASE’ defined but not used [-Wunused-const-variable=]
202 | static const struct IP_BASE UMC_BASE = { { { { 0x00014000, 0x02425800, 0, 0, 0, 0 } },
| ^~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:178:29: warning: ‘PCIE0_BASE’ defined but not used [-Wunused-const-variable=]
178 | static const struct IP_BASE PCIE0_BASE = { { { { 0x00000000, 0x00000014, 0x00000D20, 0x00010400, 0x0241B000, 0x04040000 } },
| ^~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:154:29: warning: ‘MP2_BASE’ defined but not used [-Wunused-const-variable=]
154 | static const struct IP_BASE MP2_BASE = { { { { 0x00016400, 0x02400800, 0x00F40000, 0x00F80000, 0x00FC0000, 0 } },
| ^~~~~~~~

NB: Snipped lots of these

Acked-by: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Huang Rui <ray.huang@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
diff 6cda1dbc Mon Nov 23 04:19:08 MST 2020 Lee Jones <lee.jones@linaro.org> drm/amd/include/vangogh_ip_offset: Mark top-level IP_BASE as __maybe_unused

Fixes the following W=1 kernel build warning(s):

In file included from drivers/gpu/drm/amd/amdgpu/vangogh_reg_init.c:28:
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:210:29: warning: ‘USB_BASE’ defined but not used [-Wunused-const-variable=]
210 | static const struct IP_BASE USB_BASE = { { { { 0x0242A800, 0x05B00000, 0, 0, 0, 0 } },
| ^~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:202:29: warning: ‘UMC_BASE’ defined but not used [-Wunused-const-variable=]
202 | static const struct IP_BASE UMC_BASE = { { { { 0x00014000, 0x02425800, 0, 0, 0, 0 } },
| ^~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:178:29: warning: ‘PCIE0_BASE’ defined but not used [-Wunused-const-variable=]
178 | static const struct IP_BASE PCIE0_BASE = { { { { 0x00000000, 0x00000014, 0x00000D20, 0x00010400, 0x0241B000, 0x04040000 } },
| ^~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:154:29: warning: ‘MP2_BASE’ defined but not used [-Wunused-const-variable=]
154 | static const struct IP_BASE MP2_BASE = { { { { 0x00016400, 0x02400800, 0x00F40000, 0x00F80000, 0x00FC0000, 0 } },
| ^~~~~~~~

NB: Snipped lots of these

Acked-by: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Huang Rui <ray.huang@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
diff 6cda1dbc Mon Nov 23 04:19:08 MST 2020 Lee Jones <lee.jones@linaro.org> drm/amd/include/vangogh_ip_offset: Mark top-level IP_BASE as __maybe_unused

Fixes the following W=1 kernel build warning(s):

In file included from drivers/gpu/drm/amd/amdgpu/vangogh_reg_init.c:28:
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:210:29: warning: ‘USB_BASE’ defined but not used [-Wunused-const-variable=]
210 | static const struct IP_BASE USB_BASE = { { { { 0x0242A800, 0x05B00000, 0, 0, 0, 0 } },
| ^~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:202:29: warning: ‘UMC_BASE’ defined but not used [-Wunused-const-variable=]
202 | static const struct IP_BASE UMC_BASE = { { { { 0x00014000, 0x02425800, 0, 0, 0, 0 } },
| ^~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:178:29: warning: ‘PCIE0_BASE’ defined but not used [-Wunused-const-variable=]
178 | static const struct IP_BASE PCIE0_BASE = { { { { 0x00000000, 0x00000014, 0x00000D20, 0x00010400, 0x0241B000, 0x04040000 } },
| ^~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:154:29: warning: ‘MP2_BASE’ defined but not used [-Wunused-const-variable=]
154 | static const struct IP_BASE MP2_BASE = { { { { 0x00016400, 0x02400800, 0x00F40000, 0x00F80000, 0x00FC0000, 0 } },
| ^~~~~~~~

NB: Snipped lots of these

Acked-by: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Huang Rui <ray.huang@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
diff 6cda1dbc Mon Nov 23 04:19:08 MST 2020 Lee Jones <lee.jones@linaro.org> drm/amd/include/vangogh_ip_offset: Mark top-level IP_BASE as __maybe_unused

Fixes the following W=1 kernel build warning(s):

In file included from drivers/gpu/drm/amd/amdgpu/vangogh_reg_init.c:28:
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:210:29: warning: ‘USB_BASE’ defined but not used [-Wunused-const-variable=]
210 | static const struct IP_BASE USB_BASE = { { { { 0x0242A800, 0x05B00000, 0, 0, 0, 0 } },
| ^~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:202:29: warning: ‘UMC_BASE’ defined but not used [-Wunused-const-variable=]
202 | static const struct IP_BASE UMC_BASE = { { { { 0x00014000, 0x02425800, 0, 0, 0, 0 } },
| ^~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:178:29: warning: ‘PCIE0_BASE’ defined but not used [-Wunused-const-variable=]
178 | static const struct IP_BASE PCIE0_BASE = { { { { 0x00000000, 0x00000014, 0x00000D20, 0x00010400, 0x0241B000, 0x04040000 } },
| ^~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:154:29: warning: ‘MP2_BASE’ defined but not used [-Wunused-const-variable=]
154 | static const struct IP_BASE MP2_BASE = { { { { 0x00016400, 0x02400800, 0x00F40000, 0x00F80000, 0x00FC0000, 0 } },
| ^~~~~~~~

NB: Snipped lots of these

Acked-by: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Huang Rui <ray.huang@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
diff 6cda1dbc Mon Nov 23 04:19:08 MST 2020 Lee Jones <lee.jones@linaro.org> drm/amd/include/vangogh_ip_offset: Mark top-level IP_BASE as __maybe_unused

Fixes the following W=1 kernel build warning(s):

In file included from drivers/gpu/drm/amd/amdgpu/vangogh_reg_init.c:28:
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:210:29: warning: ‘USB_BASE’ defined but not used [-Wunused-const-variable=]
210 | static const struct IP_BASE USB_BASE = { { { { 0x0242A800, 0x05B00000, 0, 0, 0, 0 } },
| ^~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:202:29: warning: ‘UMC_BASE’ defined but not used [-Wunused-const-variable=]
202 | static const struct IP_BASE UMC_BASE = { { { { 0x00014000, 0x02425800, 0, 0, 0, 0 } },
| ^~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:178:29: warning: ‘PCIE0_BASE’ defined but not used [-Wunused-const-variable=]
178 | static const struct IP_BASE PCIE0_BASE = { { { { 0x00000000, 0x00000014, 0x00000D20, 0x00010400, 0x0241B000, 0x04040000 } },
| ^~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:154:29: warning: ‘MP2_BASE’ defined but not used [-Wunused-const-variable=]
154 | static const struct IP_BASE MP2_BASE = { { { { 0x00016400, 0x02400800, 0x00F40000, 0x00F80000, 0x00FC0000, 0 } },
| ^~~~~~~~

NB: Snipped lots of these

Acked-by: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Huang Rui <ray.huang@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
diff 6cda1dbc Mon Nov 23 04:19:08 MST 2020 Lee Jones <lee.jones@linaro.org> drm/amd/include/vangogh_ip_offset: Mark top-level IP_BASE as __maybe_unused

Fixes the following W=1 kernel build warning(s):

In file included from drivers/gpu/drm/amd/amdgpu/vangogh_reg_init.c:28:
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:210:29: warning: ‘USB_BASE’ defined but not used [-Wunused-const-variable=]
210 | static const struct IP_BASE USB_BASE = { { { { 0x0242A800, 0x05B00000, 0, 0, 0, 0 } },
| ^~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:202:29: warning: ‘UMC_BASE’ defined but not used [-Wunused-const-variable=]
202 | static const struct IP_BASE UMC_BASE = { { { { 0x00014000, 0x02425800, 0, 0, 0, 0 } },
| ^~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:178:29: warning: ‘PCIE0_BASE’ defined but not used [-Wunused-const-variable=]
178 | static const struct IP_BASE PCIE0_BASE = { { { { 0x00000000, 0x00000014, 0x00000D20, 0x00010400, 0x0241B000, 0x04040000 } },
| ^~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/../include/vangogh_ip_offset.h:154:29: warning: ‘MP2_BASE’ defined but not used [-Wunused-const-variable=]
154 | static const struct IP_BASE MP2_BASE = { { { { 0x00016400, 0x02400800, 0x00F40000, 0x00F80000, 0x00FC0000, 0 } },
| ^~~~~~~~

NB: Snipped lots of these

Acked-by: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Huang Rui <ray.huang@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
/linux-master/drivers/remoteproc/
H A Dqcom_pil_info.cdiff fdc12231 Tue Nov 16 23:54:54 MST 2021 Stephen Boyd <swboyd@chromium.org> remoteproc: qcom: pil_info: Don't memcpy_toio more than is provided

If the string passed into qcom_pil_info_store() isn't as long as
PIL_RELOC_NAME_LEN we'll try to copy the string assuming the length is
PIL_RELOC_NAME_LEN to the io space and go beyond the bounds of the
string. Let's only copy as many byes as the string is long, ignoring the
NUL terminator.

This fixes the following KASAN error:

BUG: KASAN: global-out-of-bounds in __memcpy_toio+0x124/0x140
Read of size 1 at addr ffffffd35086e386 by task rmtfs/2392

CPU: 2 PID: 2392 Comm: rmtfs Tainted: G W 5.16.0-rc1-lockdep+ #10
Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
Call trace:
dump_backtrace+0x0/0x410
show_stack+0x24/0x30
dump_stack_lvl+0x7c/0xa0
print_address_description+0x78/0x2bc
kasan_report+0x160/0x1a0
__asan_report_load1_noabort+0x44/0x50
__memcpy_toio+0x124/0x140
qcom_pil_info_store+0x298/0x358 [qcom_pil_info]
q6v5_start+0xdf0/0x12e0 [qcom_q6v5_mss]
rproc_start+0x178/0x3a0
rproc_boot+0x5f0/0xb90
state_store+0x78/0x1bc
dev_attr_store+0x70/0x90
sysfs_kf_write+0xf4/0x118
kernfs_fop_write_iter+0x208/0x300
vfs_write+0x55c/0x804
ksys_pwrite64+0xc8/0x134
__arm64_compat_sys_aarch32_pwrite64+0xc4/0xdc
invoke_syscall+0x78/0x20c
el0_svc_common+0x11c/0x1f0
do_el0_svc_compat+0x50/0x60
el0_svc_compat+0x5c/0xec
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x1a4/0x1a8

The buggy address belongs to the variable:
.str.59+0x6/0xffffffffffffec80 [qcom_q6v5_mss]

Memory state around the buggy address:
ffffffd35086e280: 00 00 00 00 02 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
ffffffd35086e300: 00 02 f9 f9 f9 f9 f9 f9 00 00 00 06 f9 f9 f9 f9
>ffffffd35086e380: 06 f9 f9 f9 05 f9 f9 f9 00 00 00 00 00 06 f9 f9
^
ffffffd35086e400: f9 f9 f9 f9 01 f9 f9 f9 04 f9 f9 f9 00 00 01 f9
ffffffd35086e480: f9 f9 f9 f9 00 00 00 00 00 00 00 01 f9 f9 f9 f9

Fixes: 549b67da660d ("remoteproc: qcom: Introduce helper to store pil info in IMEM")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20211117065454.4142936-1-swboyd@chromium.org
diff fdc12231 Tue Nov 16 23:54:54 MST 2021 Stephen Boyd <swboyd@chromium.org> remoteproc: qcom: pil_info: Don't memcpy_toio more than is provided

If the string passed into qcom_pil_info_store() isn't as long as
PIL_RELOC_NAME_LEN we'll try to copy the string assuming the length is
PIL_RELOC_NAME_LEN to the io space and go beyond the bounds of the
string. Let's only copy as many byes as the string is long, ignoring the
NUL terminator.

This fixes the following KASAN error:

BUG: KASAN: global-out-of-bounds in __memcpy_toio+0x124/0x140
Read of size 1 at addr ffffffd35086e386 by task rmtfs/2392

CPU: 2 PID: 2392 Comm: rmtfs Tainted: G W 5.16.0-rc1-lockdep+ #10
Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
Call trace:
dump_backtrace+0x0/0x410
show_stack+0x24/0x30
dump_stack_lvl+0x7c/0xa0
print_address_description+0x78/0x2bc
kasan_report+0x160/0x1a0
__asan_report_load1_noabort+0x44/0x50
__memcpy_toio+0x124/0x140
qcom_pil_info_store+0x298/0x358 [qcom_pil_info]
q6v5_start+0xdf0/0x12e0 [qcom_q6v5_mss]
rproc_start+0x178/0x3a0
rproc_boot+0x5f0/0xb90
state_store+0x78/0x1bc
dev_attr_store+0x70/0x90
sysfs_kf_write+0xf4/0x118
kernfs_fop_write_iter+0x208/0x300
vfs_write+0x55c/0x804
ksys_pwrite64+0xc8/0x134
__arm64_compat_sys_aarch32_pwrite64+0xc4/0xdc
invoke_syscall+0x78/0x20c
el0_svc_common+0x11c/0x1f0
do_el0_svc_compat+0x50/0x60
el0_svc_compat+0x5c/0xec
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x1a4/0x1a8

The buggy address belongs to the variable:
.str.59+0x6/0xffffffffffffec80 [qcom_q6v5_mss]

Memory state around the buggy address:
ffffffd35086e280: 00 00 00 00 02 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
ffffffd35086e300: 00 02 f9 f9 f9 f9 f9 f9 00 00 00 06 f9 f9 f9 f9
>ffffffd35086e380: 06 f9 f9 f9 05 f9 f9 f9 00 00 00 00 00 06 f9 f9
^
ffffffd35086e400: f9 f9 f9 f9 01 f9 f9 f9 04 f9 f9 f9 00 00 01 f9
ffffffd35086e480: f9 f9 f9 f9 00 00 00 00 00 00 00 01 f9 f9 f9 f9

Fixes: 549b67da660d ("remoteproc: qcom: Introduce helper to store pil info in IMEM")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20211117065454.4142936-1-swboyd@chromium.org
diff fdc12231 Tue Nov 16 23:54:54 MST 2021 Stephen Boyd <swboyd@chromium.org> remoteproc: qcom: pil_info: Don't memcpy_toio more than is provided

If the string passed into qcom_pil_info_store() isn't as long as
PIL_RELOC_NAME_LEN we'll try to copy the string assuming the length is
PIL_RELOC_NAME_LEN to the io space and go beyond the bounds of the
string. Let's only copy as many byes as the string is long, ignoring the
NUL terminator.

This fixes the following KASAN error:

BUG: KASAN: global-out-of-bounds in __memcpy_toio+0x124/0x140
Read of size 1 at addr ffffffd35086e386 by task rmtfs/2392

CPU: 2 PID: 2392 Comm: rmtfs Tainted: G W 5.16.0-rc1-lockdep+ #10
Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
Call trace:
dump_backtrace+0x0/0x410
show_stack+0x24/0x30
dump_stack_lvl+0x7c/0xa0
print_address_description+0x78/0x2bc
kasan_report+0x160/0x1a0
__asan_report_load1_noabort+0x44/0x50
__memcpy_toio+0x124/0x140
qcom_pil_info_store+0x298/0x358 [qcom_pil_info]
q6v5_start+0xdf0/0x12e0 [qcom_q6v5_mss]
rproc_start+0x178/0x3a0
rproc_boot+0x5f0/0xb90
state_store+0x78/0x1bc
dev_attr_store+0x70/0x90
sysfs_kf_write+0xf4/0x118
kernfs_fop_write_iter+0x208/0x300
vfs_write+0x55c/0x804
ksys_pwrite64+0xc8/0x134
__arm64_compat_sys_aarch32_pwrite64+0xc4/0xdc
invoke_syscall+0x78/0x20c
el0_svc_common+0x11c/0x1f0
do_el0_svc_compat+0x50/0x60
el0_svc_compat+0x5c/0xec
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x1a4/0x1a8

The buggy address belongs to the variable:
.str.59+0x6/0xffffffffffffec80 [qcom_q6v5_mss]

Memory state around the buggy address:
ffffffd35086e280: 00 00 00 00 02 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
ffffffd35086e300: 00 02 f9 f9 f9 f9 f9 f9 00 00 00 06 f9 f9 f9 f9
>ffffffd35086e380: 06 f9 f9 f9 05 f9 f9 f9 00 00 00 00 00 06 f9 f9
^
ffffffd35086e400: f9 f9 f9 f9 01 f9 f9 f9 04 f9 f9 f9 00 00 01 f9
ffffffd35086e480: f9 f9 f9 f9 00 00 00 00 00 00 00 01 f9 f9 f9 f9

Fixes: 549b67da660d ("remoteproc: qcom: Introduce helper to store pil info in IMEM")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20211117065454.4142936-1-swboyd@chromium.org
diff fdc12231 Tue Nov 16 23:54:54 MST 2021 Stephen Boyd <swboyd@chromium.org> remoteproc: qcom: pil_info: Don't memcpy_toio more than is provided

If the string passed into qcom_pil_info_store() isn't as long as
PIL_RELOC_NAME_LEN we'll try to copy the string assuming the length is
PIL_RELOC_NAME_LEN to the io space and go beyond the bounds of the
string. Let's only copy as many byes as the string is long, ignoring the
NUL terminator.

This fixes the following KASAN error:

BUG: KASAN: global-out-of-bounds in __memcpy_toio+0x124/0x140
Read of size 1 at addr ffffffd35086e386 by task rmtfs/2392

CPU: 2 PID: 2392 Comm: rmtfs Tainted: G W 5.16.0-rc1-lockdep+ #10
Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
Call trace:
dump_backtrace+0x0/0x410
show_stack+0x24/0x30
dump_stack_lvl+0x7c/0xa0
print_address_description+0x78/0x2bc
kasan_report+0x160/0x1a0
__asan_report_load1_noabort+0x44/0x50
__memcpy_toio+0x124/0x140
qcom_pil_info_store+0x298/0x358 [qcom_pil_info]
q6v5_start+0xdf0/0x12e0 [qcom_q6v5_mss]
rproc_start+0x178/0x3a0
rproc_boot+0x5f0/0xb90
state_store+0x78/0x1bc
dev_attr_store+0x70/0x90
sysfs_kf_write+0xf4/0x118
kernfs_fop_write_iter+0x208/0x300
vfs_write+0x55c/0x804
ksys_pwrite64+0xc8/0x134
__arm64_compat_sys_aarch32_pwrite64+0xc4/0xdc
invoke_syscall+0x78/0x20c
el0_svc_common+0x11c/0x1f0
do_el0_svc_compat+0x50/0x60
el0_svc_compat+0x5c/0xec
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x1a4/0x1a8

The buggy address belongs to the variable:
.str.59+0x6/0xffffffffffffec80 [qcom_q6v5_mss]

Memory state around the buggy address:
ffffffd35086e280: 00 00 00 00 02 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
ffffffd35086e300: 00 02 f9 f9 f9 f9 f9 f9 00 00 00 06 f9 f9 f9 f9
>ffffffd35086e380: 06 f9 f9 f9 05 f9 f9 f9 00 00 00 00 00 06 f9 f9
^
ffffffd35086e400: f9 f9 f9 f9 01 f9 f9 f9 04 f9 f9 f9 00 00 01 f9
ffffffd35086e480: f9 f9 f9 f9 00 00 00 00 00 00 00 01 f9 f9 f9 f9

Fixes: 549b67da660d ("remoteproc: qcom: Introduce helper to store pil info in IMEM")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20211117065454.4142936-1-swboyd@chromium.org
diff fdc12231 Tue Nov 16 23:54:54 MST 2021 Stephen Boyd <swboyd@chromium.org> remoteproc: qcom: pil_info: Don't memcpy_toio more than is provided

If the string passed into qcom_pil_info_store() isn't as long as
PIL_RELOC_NAME_LEN we'll try to copy the string assuming the length is
PIL_RELOC_NAME_LEN to the io space and go beyond the bounds of the
string. Let's only copy as many byes as the string is long, ignoring the
NUL terminator.

This fixes the following KASAN error:

BUG: KASAN: global-out-of-bounds in __memcpy_toio+0x124/0x140
Read of size 1 at addr ffffffd35086e386 by task rmtfs/2392

CPU: 2 PID: 2392 Comm: rmtfs Tainted: G W 5.16.0-rc1-lockdep+ #10
Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
Call trace:
dump_backtrace+0x0/0x410
show_stack+0x24/0x30
dump_stack_lvl+0x7c/0xa0
print_address_description+0x78/0x2bc
kasan_report+0x160/0x1a0
__asan_report_load1_noabort+0x44/0x50
__memcpy_toio+0x124/0x140
qcom_pil_info_store+0x298/0x358 [qcom_pil_info]
q6v5_start+0xdf0/0x12e0 [qcom_q6v5_mss]
rproc_start+0x178/0x3a0
rproc_boot+0x5f0/0xb90
state_store+0x78/0x1bc
dev_attr_store+0x70/0x90
sysfs_kf_write+0xf4/0x118
kernfs_fop_write_iter+0x208/0x300
vfs_write+0x55c/0x804
ksys_pwrite64+0xc8/0x134
__arm64_compat_sys_aarch32_pwrite64+0xc4/0xdc
invoke_syscall+0x78/0x20c
el0_svc_common+0x11c/0x1f0
do_el0_svc_compat+0x50/0x60
el0_svc_compat+0x5c/0xec
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x1a4/0x1a8

The buggy address belongs to the variable:
.str.59+0x6/0xffffffffffffec80 [qcom_q6v5_mss]

Memory state around the buggy address:
ffffffd35086e280: 00 00 00 00 02 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
ffffffd35086e300: 00 02 f9 f9 f9 f9 f9 f9 00 00 00 06 f9 f9 f9 f9
>ffffffd35086e380: 06 f9 f9 f9 05 f9 f9 f9 00 00 00 00 00 06 f9 f9
^
ffffffd35086e400: f9 f9 f9 f9 01 f9 f9 f9 04 f9 f9 f9 00 00 01 f9
ffffffd35086e480: f9 f9 f9 f9 00 00 00 00 00 00 00 01 f9 f9 f9 f9

Fixes: 549b67da660d ("remoteproc: qcom: Introduce helper to store pil info in IMEM")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20211117065454.4142936-1-swboyd@chromium.org
diff fdc12231 Tue Nov 16 23:54:54 MST 2021 Stephen Boyd <swboyd@chromium.org> remoteproc: qcom: pil_info: Don't memcpy_toio more than is provided

If the string passed into qcom_pil_info_store() isn't as long as
PIL_RELOC_NAME_LEN we'll try to copy the string assuming the length is
PIL_RELOC_NAME_LEN to the io space and go beyond the bounds of the
string. Let's only copy as many byes as the string is long, ignoring the
NUL terminator.

This fixes the following KASAN error:

BUG: KASAN: global-out-of-bounds in __memcpy_toio+0x124/0x140
Read of size 1 at addr ffffffd35086e386 by task rmtfs/2392

CPU: 2 PID: 2392 Comm: rmtfs Tainted: G W 5.16.0-rc1-lockdep+ #10
Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
Call trace:
dump_backtrace+0x0/0x410
show_stack+0x24/0x30
dump_stack_lvl+0x7c/0xa0
print_address_description+0x78/0x2bc
kasan_report+0x160/0x1a0
__asan_report_load1_noabort+0x44/0x50
__memcpy_toio+0x124/0x140
qcom_pil_info_store+0x298/0x358 [qcom_pil_info]
q6v5_start+0xdf0/0x12e0 [qcom_q6v5_mss]
rproc_start+0x178/0x3a0
rproc_boot+0x5f0/0xb90
state_store+0x78/0x1bc
dev_attr_store+0x70/0x90
sysfs_kf_write+0xf4/0x118
kernfs_fop_write_iter+0x208/0x300
vfs_write+0x55c/0x804
ksys_pwrite64+0xc8/0x134
__arm64_compat_sys_aarch32_pwrite64+0xc4/0xdc
invoke_syscall+0x78/0x20c
el0_svc_common+0x11c/0x1f0
do_el0_svc_compat+0x50/0x60
el0_svc_compat+0x5c/0xec
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x1a4/0x1a8

The buggy address belongs to the variable:
.str.59+0x6/0xffffffffffffec80 [qcom_q6v5_mss]

Memory state around the buggy address:
ffffffd35086e280: 00 00 00 00 02 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
ffffffd35086e300: 00 02 f9 f9 f9 f9 f9 f9 00 00 00 06 f9 f9 f9 f9
>ffffffd35086e380: 06 f9 f9 f9 05 f9 f9 f9 00 00 00 00 00 06 f9 f9
^
ffffffd35086e400: f9 f9 f9 f9 01 f9 f9 f9 04 f9 f9 f9 00 00 01 f9
ffffffd35086e480: f9 f9 f9 f9 00 00 00 00 00 00 00 01 f9 f9 f9 f9

Fixes: 549b67da660d ("remoteproc: qcom: Introduce helper to store pil info in IMEM")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20211117065454.4142936-1-swboyd@chromium.org
diff fdc12231 Tue Nov 16 23:54:54 MST 2021 Stephen Boyd <swboyd@chromium.org> remoteproc: qcom: pil_info: Don't memcpy_toio more than is provided

If the string passed into qcom_pil_info_store() isn't as long as
PIL_RELOC_NAME_LEN we'll try to copy the string assuming the length is
PIL_RELOC_NAME_LEN to the io space and go beyond the bounds of the
string. Let's only copy as many byes as the string is long, ignoring the
NUL terminator.

This fixes the following KASAN error:

BUG: KASAN: global-out-of-bounds in __memcpy_toio+0x124/0x140
Read of size 1 at addr ffffffd35086e386 by task rmtfs/2392

CPU: 2 PID: 2392 Comm: rmtfs Tainted: G W 5.16.0-rc1-lockdep+ #10
Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
Call trace:
dump_backtrace+0x0/0x410
show_stack+0x24/0x30
dump_stack_lvl+0x7c/0xa0
print_address_description+0x78/0x2bc
kasan_report+0x160/0x1a0
__asan_report_load1_noabort+0x44/0x50
__memcpy_toio+0x124/0x140
qcom_pil_info_store+0x298/0x358 [qcom_pil_info]
q6v5_start+0xdf0/0x12e0 [qcom_q6v5_mss]
rproc_start+0x178/0x3a0
rproc_boot+0x5f0/0xb90
state_store+0x78/0x1bc
dev_attr_store+0x70/0x90
sysfs_kf_write+0xf4/0x118
kernfs_fop_write_iter+0x208/0x300
vfs_write+0x55c/0x804
ksys_pwrite64+0xc8/0x134
__arm64_compat_sys_aarch32_pwrite64+0xc4/0xdc
invoke_syscall+0x78/0x20c
el0_svc_common+0x11c/0x1f0
do_el0_svc_compat+0x50/0x60
el0_svc_compat+0x5c/0xec
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x1a4/0x1a8

The buggy address belongs to the variable:
.str.59+0x6/0xffffffffffffec80 [qcom_q6v5_mss]

Memory state around the buggy address:
ffffffd35086e280: 00 00 00 00 02 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
ffffffd35086e300: 00 02 f9 f9 f9 f9 f9 f9 00 00 00 06 f9 f9 f9 f9
>ffffffd35086e380: 06 f9 f9 f9 05 f9 f9 f9 00 00 00 00 00 06 f9 f9
^
ffffffd35086e400: f9 f9 f9 f9 01 f9 f9 f9 04 f9 f9 f9 00 00 01 f9
ffffffd35086e480: f9 f9 f9 f9 00 00 00 00 00 00 00 01 f9 f9 f9 f9

Fixes: 549b67da660d ("remoteproc: qcom: Introduce helper to store pil info in IMEM")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20211117065454.4142936-1-swboyd@chromium.org
diff fdc12231 Tue Nov 16 23:54:54 MST 2021 Stephen Boyd <swboyd@chromium.org> remoteproc: qcom: pil_info: Don't memcpy_toio more than is provided

If the string passed into qcom_pil_info_store() isn't as long as
PIL_RELOC_NAME_LEN we'll try to copy the string assuming the length is
PIL_RELOC_NAME_LEN to the io space and go beyond the bounds of the
string. Let's only copy as many byes as the string is long, ignoring the
NUL terminator.

This fixes the following KASAN error:

BUG: KASAN: global-out-of-bounds in __memcpy_toio+0x124/0x140
Read of size 1 at addr ffffffd35086e386 by task rmtfs/2392

CPU: 2 PID: 2392 Comm: rmtfs Tainted: G W 5.16.0-rc1-lockdep+ #10
Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
Call trace:
dump_backtrace+0x0/0x410
show_stack+0x24/0x30
dump_stack_lvl+0x7c/0xa0
print_address_description+0x78/0x2bc
kasan_report+0x160/0x1a0
__asan_report_load1_noabort+0x44/0x50
__memcpy_toio+0x124/0x140
qcom_pil_info_store+0x298/0x358 [qcom_pil_info]
q6v5_start+0xdf0/0x12e0 [qcom_q6v5_mss]
rproc_start+0x178/0x3a0
rproc_boot+0x5f0/0xb90
state_store+0x78/0x1bc
dev_attr_store+0x70/0x90
sysfs_kf_write+0xf4/0x118
kernfs_fop_write_iter+0x208/0x300
vfs_write+0x55c/0x804
ksys_pwrite64+0xc8/0x134
__arm64_compat_sys_aarch32_pwrite64+0xc4/0xdc
invoke_syscall+0x78/0x20c
el0_svc_common+0x11c/0x1f0
do_el0_svc_compat+0x50/0x60
el0_svc_compat+0x5c/0xec
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x1a4/0x1a8

The buggy address belongs to the variable:
.str.59+0x6/0xffffffffffffec80 [qcom_q6v5_mss]

Memory state around the buggy address:
ffffffd35086e280: 00 00 00 00 02 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
ffffffd35086e300: 00 02 f9 f9 f9 f9 f9 f9 00 00 00 06 f9 f9 f9 f9
>ffffffd35086e380: 06 f9 f9 f9 05 f9 f9 f9 00 00 00 00 00 06 f9 f9
^
ffffffd35086e400: f9 f9 f9 f9 01 f9 f9 f9 04 f9 f9 f9 00 00 01 f9
ffffffd35086e480: f9 f9 f9 f9 00 00 00 00 00 00 00 01 f9 f9 f9 f9

Fixes: 549b67da660d ("remoteproc: qcom: Introduce helper to store pil info in IMEM")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20211117065454.4142936-1-swboyd@chromium.org
diff fdc12231 Tue Nov 16 23:54:54 MST 2021 Stephen Boyd <swboyd@chromium.org> remoteproc: qcom: pil_info: Don't memcpy_toio more than is provided

If the string passed into qcom_pil_info_store() isn't as long as
PIL_RELOC_NAME_LEN we'll try to copy the string assuming the length is
PIL_RELOC_NAME_LEN to the io space and go beyond the bounds of the
string. Let's only copy as many byes as the string is long, ignoring the
NUL terminator.

This fixes the following KASAN error:

BUG: KASAN: global-out-of-bounds in __memcpy_toio+0x124/0x140
Read of size 1 at addr ffffffd35086e386 by task rmtfs/2392

CPU: 2 PID: 2392 Comm: rmtfs Tainted: G W 5.16.0-rc1-lockdep+ #10
Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
Call trace:
dump_backtrace+0x0/0x410
show_stack+0x24/0x30
dump_stack_lvl+0x7c/0xa0
print_address_description+0x78/0x2bc
kasan_report+0x160/0x1a0
__asan_report_load1_noabort+0x44/0x50
__memcpy_toio+0x124/0x140
qcom_pil_info_store+0x298/0x358 [qcom_pil_info]
q6v5_start+0xdf0/0x12e0 [qcom_q6v5_mss]
rproc_start+0x178/0x3a0
rproc_boot+0x5f0/0xb90
state_store+0x78/0x1bc
dev_attr_store+0x70/0x90
sysfs_kf_write+0xf4/0x118
kernfs_fop_write_iter+0x208/0x300
vfs_write+0x55c/0x804
ksys_pwrite64+0xc8/0x134
__arm64_compat_sys_aarch32_pwrite64+0xc4/0xdc
invoke_syscall+0x78/0x20c
el0_svc_common+0x11c/0x1f0
do_el0_svc_compat+0x50/0x60
el0_svc_compat+0x5c/0xec
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x1a4/0x1a8

The buggy address belongs to the variable:
.str.59+0x6/0xffffffffffffec80 [qcom_q6v5_mss]

Memory state around the buggy address:
ffffffd35086e280: 00 00 00 00 02 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
ffffffd35086e300: 00 02 f9 f9 f9 f9 f9 f9 00 00 00 06 f9 f9 f9 f9
>ffffffd35086e380: 06 f9 f9 f9 05 f9 f9 f9 00 00 00 00 00 06 f9 f9
^
ffffffd35086e400: f9 f9 f9 f9 01 f9 f9 f9 04 f9 f9 f9 00 00 01 f9
ffffffd35086e480: f9 f9 f9 f9 00 00 00 00 00 00 00 01 f9 f9 f9 f9

Fixes: 549b67da660d ("remoteproc: qcom: Introduce helper to store pil info in IMEM")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20211117065454.4142936-1-swboyd@chromium.org
diff fdc12231 Tue Nov 16 23:54:54 MST 2021 Stephen Boyd <swboyd@chromium.org> remoteproc: qcom: pil_info: Don't memcpy_toio more than is provided

If the string passed into qcom_pil_info_store() isn't as long as
PIL_RELOC_NAME_LEN we'll try to copy the string assuming the length is
PIL_RELOC_NAME_LEN to the io space and go beyond the bounds of the
string. Let's only copy as many byes as the string is long, ignoring the
NUL terminator.

This fixes the following KASAN error:

BUG: KASAN: global-out-of-bounds in __memcpy_toio+0x124/0x140
Read of size 1 at addr ffffffd35086e386 by task rmtfs/2392

CPU: 2 PID: 2392 Comm: rmtfs Tainted: G W 5.16.0-rc1-lockdep+ #10
Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
Call trace:
dump_backtrace+0x0/0x410
show_stack+0x24/0x30
dump_stack_lvl+0x7c/0xa0
print_address_description+0x78/0x2bc
kasan_report+0x160/0x1a0
__asan_report_load1_noabort+0x44/0x50
__memcpy_toio+0x124/0x140
qcom_pil_info_store+0x298/0x358 [qcom_pil_info]
q6v5_start+0xdf0/0x12e0 [qcom_q6v5_mss]
rproc_start+0x178/0x3a0
rproc_boot+0x5f0/0xb90
state_store+0x78/0x1bc
dev_attr_store+0x70/0x90
sysfs_kf_write+0xf4/0x118
kernfs_fop_write_iter+0x208/0x300
vfs_write+0x55c/0x804
ksys_pwrite64+0xc8/0x134
__arm64_compat_sys_aarch32_pwrite64+0xc4/0xdc
invoke_syscall+0x78/0x20c
el0_svc_common+0x11c/0x1f0
do_el0_svc_compat+0x50/0x60
el0_svc_compat+0x5c/0xec
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x1a4/0x1a8

The buggy address belongs to the variable:
.str.59+0x6/0xffffffffffffec80 [qcom_q6v5_mss]

Memory state around the buggy address:
ffffffd35086e280: 00 00 00 00 02 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
ffffffd35086e300: 00 02 f9 f9 f9 f9 f9 f9 00 00 00 06 f9 f9 f9 f9
>ffffffd35086e380: 06 f9 f9 f9 05 f9 f9 f9 00 00 00 00 00 06 f9 f9
^
ffffffd35086e400: f9 f9 f9 f9 01 f9 f9 f9 04 f9 f9 f9 00 00 01 f9
ffffffd35086e480: f9 f9 f9 f9 00 00 00 00 00 00 00 01 f9 f9 f9 f9

Fixes: 549b67da660d ("remoteproc: qcom: Introduce helper to store pil info in IMEM")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20211117065454.4142936-1-swboyd@chromium.org
diff fdc12231 Tue Nov 16 23:54:54 MST 2021 Stephen Boyd <swboyd@chromium.org> remoteproc: qcom: pil_info: Don't memcpy_toio more than is provided

If the string passed into qcom_pil_info_store() isn't as long as
PIL_RELOC_NAME_LEN we'll try to copy the string assuming the length is
PIL_RELOC_NAME_LEN to the io space and go beyond the bounds of the
string. Let's only copy as many byes as the string is long, ignoring the
NUL terminator.

This fixes the following KASAN error:

BUG: KASAN: global-out-of-bounds in __memcpy_toio+0x124/0x140
Read of size 1 at addr ffffffd35086e386 by task rmtfs/2392

CPU: 2 PID: 2392 Comm: rmtfs Tainted: G W 5.16.0-rc1-lockdep+ #10
Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
Call trace:
dump_backtrace+0x0/0x410
show_stack+0x24/0x30
dump_stack_lvl+0x7c/0xa0
print_address_description+0x78/0x2bc
kasan_report+0x160/0x1a0
__asan_report_load1_noabort+0x44/0x50
__memcpy_toio+0x124/0x140
qcom_pil_info_store+0x298/0x358 [qcom_pil_info]
q6v5_start+0xdf0/0x12e0 [qcom_q6v5_mss]
rproc_start+0x178/0x3a0
rproc_boot+0x5f0/0xb90
state_store+0x78/0x1bc
dev_attr_store+0x70/0x90
sysfs_kf_write+0xf4/0x118
kernfs_fop_write_iter+0x208/0x300
vfs_write+0x55c/0x804
ksys_pwrite64+0xc8/0x134
__arm64_compat_sys_aarch32_pwrite64+0xc4/0xdc
invoke_syscall+0x78/0x20c
el0_svc_common+0x11c/0x1f0
do_el0_svc_compat+0x50/0x60
el0_svc_compat+0x5c/0xec
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x1a4/0x1a8

The buggy address belongs to the variable:
.str.59+0x6/0xffffffffffffec80 [qcom_q6v5_mss]

Memory state around the buggy address:
ffffffd35086e280: 00 00 00 00 02 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
ffffffd35086e300: 00 02 f9 f9 f9 f9 f9 f9 00 00 00 06 f9 f9 f9 f9
>ffffffd35086e380: 06 f9 f9 f9 05 f9 f9 f9 00 00 00 00 00 06 f9 f9
^
ffffffd35086e400: f9 f9 f9 f9 01 f9 f9 f9 04 f9 f9 f9 00 00 01 f9
ffffffd35086e480: f9 f9 f9 f9 00 00 00 00 00 00 00 01 f9 f9 f9 f9

Fixes: 549b67da660d ("remoteproc: qcom: Introduce helper to store pil info in IMEM")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20211117065454.4142936-1-swboyd@chromium.org
diff fdc12231 Tue Nov 16 23:54:54 MST 2021 Stephen Boyd <swboyd@chromium.org> remoteproc: qcom: pil_info: Don't memcpy_toio more than is provided

If the string passed into qcom_pil_info_store() isn't as long as
PIL_RELOC_NAME_LEN we'll try to copy the string assuming the length is
PIL_RELOC_NAME_LEN to the io space and go beyond the bounds of the
string. Let's only copy as many byes as the string is long, ignoring the
NUL terminator.

This fixes the following KASAN error:

BUG: KASAN: global-out-of-bounds in __memcpy_toio+0x124/0x140
Read of size 1 at addr ffffffd35086e386 by task rmtfs/2392

CPU: 2 PID: 2392 Comm: rmtfs Tainted: G W 5.16.0-rc1-lockdep+ #10
Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
Call trace:
dump_backtrace+0x0/0x410
show_stack+0x24/0x30
dump_stack_lvl+0x7c/0xa0
print_address_description+0x78/0x2bc
kasan_report+0x160/0x1a0
__asan_report_load1_noabort+0x44/0x50
__memcpy_toio+0x124/0x140
qcom_pil_info_store+0x298/0x358 [qcom_pil_info]
q6v5_start+0xdf0/0x12e0 [qcom_q6v5_mss]
rproc_start+0x178/0x3a0
rproc_boot+0x5f0/0xb90
state_store+0x78/0x1bc
dev_attr_store+0x70/0x90
sysfs_kf_write+0xf4/0x118
kernfs_fop_write_iter+0x208/0x300
vfs_write+0x55c/0x804
ksys_pwrite64+0xc8/0x134
__arm64_compat_sys_aarch32_pwrite64+0xc4/0xdc
invoke_syscall+0x78/0x20c
el0_svc_common+0x11c/0x1f0
do_el0_svc_compat+0x50/0x60
el0_svc_compat+0x5c/0xec
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x1a4/0x1a8

The buggy address belongs to the variable:
.str.59+0x6/0xffffffffffffec80 [qcom_q6v5_mss]

Memory state around the buggy address:
ffffffd35086e280: 00 00 00 00 02 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
ffffffd35086e300: 00 02 f9 f9 f9 f9 f9 f9 00 00 00 06 f9 f9 f9 f9
>ffffffd35086e380: 06 f9 f9 f9 05 f9 f9 f9 00 00 00 00 00 06 f9 f9
^
ffffffd35086e400: f9 f9 f9 f9 01 f9 f9 f9 04 f9 f9 f9 00 00 01 f9
ffffffd35086e480: f9 f9 f9 f9 00 00 00 00 00 00 00 01 f9 f9 f9 f9

Fixes: 549b67da660d ("remoteproc: qcom: Introduce helper to store pil info in IMEM")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20211117065454.4142936-1-swboyd@chromium.org
diff fdc12231 Tue Nov 16 23:54:54 MST 2021 Stephen Boyd <swboyd@chromium.org> remoteproc: qcom: pil_info: Don't memcpy_toio more than is provided

If the string passed into qcom_pil_info_store() isn't as long as
PIL_RELOC_NAME_LEN we'll try to copy the string assuming the length is
PIL_RELOC_NAME_LEN to the io space and go beyond the bounds of the
string. Let's only copy as many byes as the string is long, ignoring the
NUL terminator.

This fixes the following KASAN error:

BUG: KASAN: global-out-of-bounds in __memcpy_toio+0x124/0x140
Read of size 1 at addr ffffffd35086e386 by task rmtfs/2392

CPU: 2 PID: 2392 Comm: rmtfs Tainted: G W 5.16.0-rc1-lockdep+ #10
Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
Call trace:
dump_backtrace+0x0/0x410
show_stack+0x24/0x30
dump_stack_lvl+0x7c/0xa0
print_address_description+0x78/0x2bc
kasan_report+0x160/0x1a0
__asan_report_load1_noabort+0x44/0x50
__memcpy_toio+0x124/0x140
qcom_pil_info_store+0x298/0x358 [qcom_pil_info]
q6v5_start+0xdf0/0x12e0 [qcom_q6v5_mss]
rproc_start+0x178/0x3a0
rproc_boot+0x5f0/0xb90
state_store+0x78/0x1bc
dev_attr_store+0x70/0x90
sysfs_kf_write+0xf4/0x118
kernfs_fop_write_iter+0x208/0x300
vfs_write+0x55c/0x804
ksys_pwrite64+0xc8/0x134
__arm64_compat_sys_aarch32_pwrite64+0xc4/0xdc
invoke_syscall+0x78/0x20c
el0_svc_common+0x11c/0x1f0
do_el0_svc_compat+0x50/0x60
el0_svc_compat+0x5c/0xec
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x1a4/0x1a8

The buggy address belongs to the variable:
.str.59+0x6/0xffffffffffffec80 [qcom_q6v5_mss]

Memory state around the buggy address:
ffffffd35086e280: 00 00 00 00 02 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
ffffffd35086e300: 00 02 f9 f9 f9 f9 f9 f9 00 00 00 06 f9 f9 f9 f9
>ffffffd35086e380: 06 f9 f9 f9 05 f9 f9 f9 00 00 00 00 00 06 f9 f9
^
ffffffd35086e400: f9 f9 f9 f9 01 f9 f9 f9 04 f9 f9 f9 00 00 01 f9
ffffffd35086e480: f9 f9 f9 f9 00 00 00 00 00 00 00 01 f9 f9 f9 f9

Fixes: 549b67da660d ("remoteproc: qcom: Introduce helper to store pil info in IMEM")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20211117065454.4142936-1-swboyd@chromium.org
diff fdc12231 Tue Nov 16 23:54:54 MST 2021 Stephen Boyd <swboyd@chromium.org> remoteproc: qcom: pil_info: Don't memcpy_toio more than is provided

If the string passed into qcom_pil_info_store() isn't as long as
PIL_RELOC_NAME_LEN we'll try to copy the string assuming the length is
PIL_RELOC_NAME_LEN to the io space and go beyond the bounds of the
string. Let's only copy as many byes as the string is long, ignoring the
NUL terminator.

This fixes the following KASAN error:

BUG: KASAN: global-out-of-bounds in __memcpy_toio+0x124/0x140
Read of size 1 at addr ffffffd35086e386 by task rmtfs/2392

CPU: 2 PID: 2392 Comm: rmtfs Tainted: G W 5.16.0-rc1-lockdep+ #10
Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
Call trace:
dump_backtrace+0x0/0x410
show_stack+0x24/0x30
dump_stack_lvl+0x7c/0xa0
print_address_description+0x78/0x2bc
kasan_report+0x160/0x1a0
__asan_report_load1_noabort+0x44/0x50
__memcpy_toio+0x124/0x140
qcom_pil_info_store+0x298/0x358 [qcom_pil_info]
q6v5_start+0xdf0/0x12e0 [qcom_q6v5_mss]
rproc_start+0x178/0x3a0
rproc_boot+0x5f0/0xb90
state_store+0x78/0x1bc
dev_attr_store+0x70/0x90
sysfs_kf_write+0xf4/0x118
kernfs_fop_write_iter+0x208/0x300
vfs_write+0x55c/0x804
ksys_pwrite64+0xc8/0x134
__arm64_compat_sys_aarch32_pwrite64+0xc4/0xdc
invoke_syscall+0x78/0x20c
el0_svc_common+0x11c/0x1f0
do_el0_svc_compat+0x50/0x60
el0_svc_compat+0x5c/0xec
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x1a4/0x1a8

The buggy address belongs to the variable:
.str.59+0x6/0xffffffffffffec80 [qcom_q6v5_mss]

Memory state around the buggy address:
ffffffd35086e280: 00 00 00 00 02 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
ffffffd35086e300: 00 02 f9 f9 f9 f9 f9 f9 00 00 00 06 f9 f9 f9 f9
>ffffffd35086e380: 06 f9 f9 f9 05 f9 f9 f9 00 00 00 00 00 06 f9 f9
^
ffffffd35086e400: f9 f9 f9 f9 01 f9 f9 f9 04 f9 f9 f9 00 00 01 f9
ffffffd35086e480: f9 f9 f9 f9 00 00 00 00 00 00 00 01 f9 f9 f9 f9

Fixes: 549b67da660d ("remoteproc: qcom: Introduce helper to store pil info in IMEM")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20211117065454.4142936-1-swboyd@chromium.org
diff fdc12231 Tue Nov 16 23:54:54 MST 2021 Stephen Boyd <swboyd@chromium.org> remoteproc: qcom: pil_info: Don't memcpy_toio more than is provided

If the string passed into qcom_pil_info_store() isn't as long as
PIL_RELOC_NAME_LEN we'll try to copy the string assuming the length is
PIL_RELOC_NAME_LEN to the io space and go beyond the bounds of the
string. Let's only copy as many byes as the string is long, ignoring the
NUL terminator.

This fixes the following KASAN error:

BUG: KASAN: global-out-of-bounds in __memcpy_toio+0x124/0x140
Read of size 1 at addr ffffffd35086e386 by task rmtfs/2392

CPU: 2 PID: 2392 Comm: rmtfs Tainted: G W 5.16.0-rc1-lockdep+ #10
Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
Call trace:
dump_backtrace+0x0/0x410
show_stack+0x24/0x30
dump_stack_lvl+0x7c/0xa0
print_address_description+0x78/0x2bc
kasan_report+0x160/0x1a0
__asan_report_load1_noabort+0x44/0x50
__memcpy_toio+0x124/0x140
qcom_pil_info_store+0x298/0x358 [qcom_pil_info]
q6v5_start+0xdf0/0x12e0 [qcom_q6v5_mss]
rproc_start+0x178/0x3a0
rproc_boot+0x5f0/0xb90
state_store+0x78/0x1bc
dev_attr_store+0x70/0x90
sysfs_kf_write+0xf4/0x118
kernfs_fop_write_iter+0x208/0x300
vfs_write+0x55c/0x804
ksys_pwrite64+0xc8/0x134
__arm64_compat_sys_aarch32_pwrite64+0xc4/0xdc
invoke_syscall+0x78/0x20c
el0_svc_common+0x11c/0x1f0
do_el0_svc_compat+0x50/0x60
el0_svc_compat+0x5c/0xec
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x1a4/0x1a8

The buggy address belongs to the variable:
.str.59+0x6/0xffffffffffffec80 [qcom_q6v5_mss]

Memory state around the buggy address:
ffffffd35086e280: 00 00 00 00 02 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
ffffffd35086e300: 00 02 f9 f9 f9 f9 f9 f9 00 00 00 06 f9 f9 f9 f9
>ffffffd35086e380: 06 f9 f9 f9 05 f9 f9 f9 00 00 00 00 00 06 f9 f9
^
ffffffd35086e400: f9 f9 f9 f9 01 f9 f9 f9 04 f9 f9 f9 00 00 01 f9
ffffffd35086e480: f9 f9 f9 f9 00 00 00 00 00 00 00 01 f9 f9 f9 f9

Fixes: 549b67da660d ("remoteproc: qcom: Introduce helper to store pil info in IMEM")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20211117065454.4142936-1-swboyd@chromium.org
diff fdc12231 Tue Nov 16 23:54:54 MST 2021 Stephen Boyd <swboyd@chromium.org> remoteproc: qcom: pil_info: Don't memcpy_toio more than is provided

If the string passed into qcom_pil_info_store() isn't as long as
PIL_RELOC_NAME_LEN we'll try to copy the string assuming the length is
PIL_RELOC_NAME_LEN to the io space and go beyond the bounds of the
string. Let's only copy as many byes as the string is long, ignoring the
NUL terminator.

This fixes the following KASAN error:

BUG: KASAN: global-out-of-bounds in __memcpy_toio+0x124/0x140
Read of size 1 at addr ffffffd35086e386 by task rmtfs/2392

CPU: 2 PID: 2392 Comm: rmtfs Tainted: G W 5.16.0-rc1-lockdep+ #10
Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
Call trace:
dump_backtrace+0x0/0x410
show_stack+0x24/0x30
dump_stack_lvl+0x7c/0xa0
print_address_description+0x78/0x2bc
kasan_report+0x160/0x1a0
__asan_report_load1_noabort+0x44/0x50
__memcpy_toio+0x124/0x140
qcom_pil_info_store+0x298/0x358 [qcom_pil_info]
q6v5_start+0xdf0/0x12e0 [qcom_q6v5_mss]
rproc_start+0x178/0x3a0
rproc_boot+0x5f0/0xb90
state_store+0x78/0x1bc
dev_attr_store+0x70/0x90
sysfs_kf_write+0xf4/0x118
kernfs_fop_write_iter+0x208/0x300
vfs_write+0x55c/0x804
ksys_pwrite64+0xc8/0x134
__arm64_compat_sys_aarch32_pwrite64+0xc4/0xdc
invoke_syscall+0x78/0x20c
el0_svc_common+0x11c/0x1f0
do_el0_svc_compat+0x50/0x60
el0_svc_compat+0x5c/0xec
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x1a4/0x1a8

The buggy address belongs to the variable:
.str.59+0x6/0xffffffffffffec80 [qcom_q6v5_mss]

Memory state around the buggy address:
ffffffd35086e280: 00 00 00 00 02 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
ffffffd35086e300: 00 02 f9 f9 f9 f9 f9 f9 00 00 00 06 f9 f9 f9 f9
>ffffffd35086e380: 06 f9 f9 f9 05 f9 f9 f9 00 00 00 00 00 06 f9 f9
^
ffffffd35086e400: f9 f9 f9 f9 01 f9 f9 f9 04 f9 f9 f9 00 00 01 f9
ffffffd35086e480: f9 f9 f9 f9 00 00 00 00 00 00 00 01 f9 f9 f9 f9

Fixes: 549b67da660d ("remoteproc: qcom: Introduce helper to store pil info in IMEM")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20211117065454.4142936-1-swboyd@chromium.org
diff fdc12231 Tue Nov 16 23:54:54 MST 2021 Stephen Boyd <swboyd@chromium.org> remoteproc: qcom: pil_info: Don't memcpy_toio more than is provided

If the string passed into qcom_pil_info_store() isn't as long as
PIL_RELOC_NAME_LEN we'll try to copy the string assuming the length is
PIL_RELOC_NAME_LEN to the io space and go beyond the bounds of the
string. Let's only copy as many byes as the string is long, ignoring the
NUL terminator.

This fixes the following KASAN error:

BUG: KASAN: global-out-of-bounds in __memcpy_toio+0x124/0x140
Read of size 1 at addr ffffffd35086e386 by task rmtfs/2392

CPU: 2 PID: 2392 Comm: rmtfs Tainted: G W 5.16.0-rc1-lockdep+ #10
Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
Call trace:
dump_backtrace+0x0/0x410
show_stack+0x24/0x30
dump_stack_lvl+0x7c/0xa0
print_address_description+0x78/0x2bc
kasan_report+0x160/0x1a0
__asan_report_load1_noabort+0x44/0x50
__memcpy_toio+0x124/0x140
qcom_pil_info_store+0x298/0x358 [qcom_pil_info]
q6v5_start+0xdf0/0x12e0 [qcom_q6v5_mss]
rproc_start+0x178/0x3a0
rproc_boot+0x5f0/0xb90
state_store+0x78/0x1bc
dev_attr_store+0x70/0x90
sysfs_kf_write+0xf4/0x118
kernfs_fop_write_iter+0x208/0x300
vfs_write+0x55c/0x804
ksys_pwrite64+0xc8/0x134
__arm64_compat_sys_aarch32_pwrite64+0xc4/0xdc
invoke_syscall+0x78/0x20c
el0_svc_common+0x11c/0x1f0
do_el0_svc_compat+0x50/0x60
el0_svc_compat+0x5c/0xec
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x1a4/0x1a8

The buggy address belongs to the variable:
.str.59+0x6/0xffffffffffffec80 [qcom_q6v5_mss]

Memory state around the buggy address:
ffffffd35086e280: 00 00 00 00 02 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
ffffffd35086e300: 00 02 f9 f9 f9 f9 f9 f9 00 00 00 06 f9 f9 f9 f9
>ffffffd35086e380: 06 f9 f9 f9 05 f9 f9 f9 00 00 00 00 00 06 f9 f9
^
ffffffd35086e400: f9 f9 f9 f9 01 f9 f9 f9 04 f9 f9 f9 00 00 01 f9
ffffffd35086e480: f9 f9 f9 f9 00 00 00 00 00 00 00 01 f9 f9 f9 f9

Fixes: 549b67da660d ("remoteproc: qcom: Introduce helper to store pil info in IMEM")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20211117065454.4142936-1-swboyd@chromium.org
diff fdc12231 Tue Nov 16 23:54:54 MST 2021 Stephen Boyd <swboyd@chromium.org> remoteproc: qcom: pil_info: Don't memcpy_toio more than is provided

If the string passed into qcom_pil_info_store() isn't as long as
PIL_RELOC_NAME_LEN we'll try to copy the string assuming the length is
PIL_RELOC_NAME_LEN to the io space and go beyond the bounds of the
string. Let's only copy as many byes as the string is long, ignoring the
NUL terminator.

This fixes the following KASAN error:

BUG: KASAN: global-out-of-bounds in __memcpy_toio+0x124/0x140
Read of size 1 at addr ffffffd35086e386 by task rmtfs/2392

CPU: 2 PID: 2392 Comm: rmtfs Tainted: G W 5.16.0-rc1-lockdep+ #10
Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
Call trace:
dump_backtrace+0x0/0x410
show_stack+0x24/0x30
dump_stack_lvl+0x7c/0xa0
print_address_description+0x78/0x2bc
kasan_report+0x160/0x1a0
__asan_report_load1_noabort+0x44/0x50
__memcpy_toio+0x124/0x140
qcom_pil_info_store+0x298/0x358 [qcom_pil_info]
q6v5_start+0xdf0/0x12e0 [qcom_q6v5_mss]
rproc_start+0x178/0x3a0
rproc_boot+0x5f0/0xb90
state_store+0x78/0x1bc
dev_attr_store+0x70/0x90
sysfs_kf_write+0xf4/0x118
kernfs_fop_write_iter+0x208/0x300
vfs_write+0x55c/0x804
ksys_pwrite64+0xc8/0x134
__arm64_compat_sys_aarch32_pwrite64+0xc4/0xdc
invoke_syscall+0x78/0x20c
el0_svc_common+0x11c/0x1f0
do_el0_svc_compat+0x50/0x60
el0_svc_compat+0x5c/0xec
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x1a4/0x1a8

The buggy address belongs to the variable:
.str.59+0x6/0xffffffffffffec80 [qcom_q6v5_mss]

Memory state around the buggy address:
ffffffd35086e280: 00 00 00 00 02 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
ffffffd35086e300: 00 02 f9 f9 f9 f9 f9 f9 00 00 00 06 f9 f9 f9 f9
>ffffffd35086e380: 06 f9 f9 f9 05 f9 f9 f9 00 00 00 00 00 06 f9 f9
^
ffffffd35086e400: f9 f9 f9 f9 01 f9 f9 f9 04 f9 f9 f9 00 00 01 f9
ffffffd35086e480: f9 f9 f9 f9 00 00 00 00 00 00 00 01 f9 f9 f9 f9

Fixes: 549b67da660d ("remoteproc: qcom: Introduce helper to store pil info in IMEM")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20211117065454.4142936-1-swboyd@chromium.org
diff fdc12231 Tue Nov 16 23:54:54 MST 2021 Stephen Boyd <swboyd@chromium.org> remoteproc: qcom: pil_info: Don't memcpy_toio more than is provided

If the string passed into qcom_pil_info_store() isn't as long as
PIL_RELOC_NAME_LEN we'll try to copy the string assuming the length is
PIL_RELOC_NAME_LEN to the io space and go beyond the bounds of the
string. Let's only copy as many byes as the string is long, ignoring the
NUL terminator.

This fixes the following KASAN error:

BUG: KASAN: global-out-of-bounds in __memcpy_toio+0x124/0x140
Read of size 1 at addr ffffffd35086e386 by task rmtfs/2392

CPU: 2 PID: 2392 Comm: rmtfs Tainted: G W 5.16.0-rc1-lockdep+ #10
Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
Call trace:
dump_backtrace+0x0/0x410
show_stack+0x24/0x30
dump_stack_lvl+0x7c/0xa0
print_address_description+0x78/0x2bc
kasan_report+0x160/0x1a0
__asan_report_load1_noabort+0x44/0x50
__memcpy_toio+0x124/0x140
qcom_pil_info_store+0x298/0x358 [qcom_pil_info]
q6v5_start+0xdf0/0x12e0 [qcom_q6v5_mss]
rproc_start+0x178/0x3a0
rproc_boot+0x5f0/0xb90
state_store+0x78/0x1bc
dev_attr_store+0x70/0x90
sysfs_kf_write+0xf4/0x118
kernfs_fop_write_iter+0x208/0x300
vfs_write+0x55c/0x804
ksys_pwrite64+0xc8/0x134
__arm64_compat_sys_aarch32_pwrite64+0xc4/0xdc
invoke_syscall+0x78/0x20c
el0_svc_common+0x11c/0x1f0
do_el0_svc_compat+0x50/0x60
el0_svc_compat+0x5c/0xec
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x1a4/0x1a8

The buggy address belongs to the variable:
.str.59+0x6/0xffffffffffffec80 [qcom_q6v5_mss]

Memory state around the buggy address:
ffffffd35086e280: 00 00 00 00 02 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
ffffffd35086e300: 00 02 f9 f9 f9 f9 f9 f9 00 00 00 06 f9 f9 f9 f9
>ffffffd35086e380: 06 f9 f9 f9 05 f9 f9 f9 00 00 00 00 00 06 f9 f9
^
ffffffd35086e400: f9 f9 f9 f9 01 f9 f9 f9 04 f9 f9 f9 00 00 01 f9
ffffffd35086e480: f9 f9 f9 f9 00 00 00 00 00 00 00 01 f9 f9 f9 f9

Fixes: 549b67da660d ("remoteproc: qcom: Introduce helper to store pil info in IMEM")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20211117065454.4142936-1-swboyd@chromium.org
diff fdc12231 Tue Nov 16 23:54:54 MST 2021 Stephen Boyd <swboyd@chromium.org> remoteproc: qcom: pil_info: Don't memcpy_toio more than is provided

If the string passed into qcom_pil_info_store() isn't as long as
PIL_RELOC_NAME_LEN we'll try to copy the string assuming the length is
PIL_RELOC_NAME_LEN to the io space and go beyond the bounds of the
string. Let's only copy as many byes as the string is long, ignoring the
NUL terminator.

This fixes the following KASAN error:

BUG: KASAN: global-out-of-bounds in __memcpy_toio+0x124/0x140
Read of size 1 at addr ffffffd35086e386 by task rmtfs/2392

CPU: 2 PID: 2392 Comm: rmtfs Tainted: G W 5.16.0-rc1-lockdep+ #10
Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
Call trace:
dump_backtrace+0x0/0x410
show_stack+0x24/0x30
dump_stack_lvl+0x7c/0xa0
print_address_description+0x78/0x2bc
kasan_report+0x160/0x1a0
__asan_report_load1_noabort+0x44/0x50
__memcpy_toio+0x124/0x140
qcom_pil_info_store+0x298/0x358 [qcom_pil_info]
q6v5_start+0xdf0/0x12e0 [qcom_q6v5_mss]
rproc_start+0x178/0x3a0
rproc_boot+0x5f0/0xb90
state_store+0x78/0x1bc
dev_attr_store+0x70/0x90
sysfs_kf_write+0xf4/0x118
kernfs_fop_write_iter+0x208/0x300
vfs_write+0x55c/0x804
ksys_pwrite64+0xc8/0x134
__arm64_compat_sys_aarch32_pwrite64+0xc4/0xdc
invoke_syscall+0x78/0x20c
el0_svc_common+0x11c/0x1f0
do_el0_svc_compat+0x50/0x60
el0_svc_compat+0x5c/0xec
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x1a4/0x1a8

The buggy address belongs to the variable:
.str.59+0x6/0xffffffffffffec80 [qcom_q6v5_mss]

Memory state around the buggy address:
ffffffd35086e280: 00 00 00 00 02 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
ffffffd35086e300: 00 02 f9 f9 f9 f9 f9 f9 00 00 00 06 f9 f9 f9 f9
>ffffffd35086e380: 06 f9 f9 f9 05 f9 f9 f9 00 00 00 00 00 06 f9 f9
^
ffffffd35086e400: f9 f9 f9 f9 01 f9 f9 f9 04 f9 f9 f9 00 00 01 f9
ffffffd35086e480: f9 f9 f9 f9 00 00 00 00 00 00 00 01 f9 f9 f9 f9

Fixes: 549b67da660d ("remoteproc: qcom: Introduce helper to store pil info in IMEM")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20211117065454.4142936-1-swboyd@chromium.org
diff fdc12231 Tue Nov 16 23:54:54 MST 2021 Stephen Boyd <swboyd@chromium.org> remoteproc: qcom: pil_info: Don't memcpy_toio more than is provided

If the string passed into qcom_pil_info_store() isn't as long as
PIL_RELOC_NAME_LEN we'll try to copy the string assuming the length is
PIL_RELOC_NAME_LEN to the io space and go beyond the bounds of the
string. Let's only copy as many byes as the string is long, ignoring the
NUL terminator.

This fixes the following KASAN error:

BUG: KASAN: global-out-of-bounds in __memcpy_toio+0x124/0x140
Read of size 1 at addr ffffffd35086e386 by task rmtfs/2392

CPU: 2 PID: 2392 Comm: rmtfs Tainted: G W 5.16.0-rc1-lockdep+ #10
Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
Call trace:
dump_backtrace+0x0/0x410
show_stack+0x24/0x30
dump_stack_lvl+0x7c/0xa0
print_address_description+0x78/0x2bc
kasan_report+0x160/0x1a0
__asan_report_load1_noabort+0x44/0x50
__memcpy_toio+0x124/0x140
qcom_pil_info_store+0x298/0x358 [qcom_pil_info]
q6v5_start+0xdf0/0x12e0 [qcom_q6v5_mss]
rproc_start+0x178/0x3a0
rproc_boot+0x5f0/0xb90
state_store+0x78/0x1bc
dev_attr_store+0x70/0x90
sysfs_kf_write+0xf4/0x118
kernfs_fop_write_iter+0x208/0x300
vfs_write+0x55c/0x804
ksys_pwrite64+0xc8/0x134
__arm64_compat_sys_aarch32_pwrite64+0xc4/0xdc
invoke_syscall+0x78/0x20c
el0_svc_common+0x11c/0x1f0
do_el0_svc_compat+0x50/0x60
el0_svc_compat+0x5c/0xec
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x1a4/0x1a8

The buggy address belongs to the variable:
.str.59+0x6/0xffffffffffffec80 [qcom_q6v5_mss]

Memory state around the buggy address:
ffffffd35086e280: 00 00 00 00 02 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
ffffffd35086e300: 00 02 f9 f9 f9 f9 f9 f9 00 00 00 06 f9 f9 f9 f9
>ffffffd35086e380: 06 f9 f9 f9 05 f9 f9 f9 00 00 00 00 00 06 f9 f9
^
ffffffd35086e400: f9 f9 f9 f9 01 f9 f9 f9 04 f9 f9 f9 00 00 01 f9
ffffffd35086e480: f9 f9 f9 f9 00 00 00 00 00 00 00 01 f9 f9 f9 f9

Fixes: 549b67da660d ("remoteproc: qcom: Introduce helper to store pil info in IMEM")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20211117065454.4142936-1-swboyd@chromium.org
diff fdc12231 Tue Nov 16 23:54:54 MST 2021 Stephen Boyd <swboyd@chromium.org> remoteproc: qcom: pil_info: Don't memcpy_toio more than is provided

If the string passed into qcom_pil_info_store() isn't as long as
PIL_RELOC_NAME_LEN we'll try to copy the string assuming the length is
PIL_RELOC_NAME_LEN to the io space and go beyond the bounds of the
string. Let's only copy as many byes as the string is long, ignoring the
NUL terminator.

This fixes the following KASAN error:

BUG: KASAN: global-out-of-bounds in __memcpy_toio+0x124/0x140
Read of size 1 at addr ffffffd35086e386 by task rmtfs/2392

CPU: 2 PID: 2392 Comm: rmtfs Tainted: G W 5.16.0-rc1-lockdep+ #10
Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
Call trace:
dump_backtrace+0x0/0x410
show_stack+0x24/0x30
dump_stack_lvl+0x7c/0xa0
print_address_description+0x78/0x2bc
kasan_report+0x160/0x1a0
__asan_report_load1_noabort+0x44/0x50
__memcpy_toio+0x124/0x140
qcom_pil_info_store+0x298/0x358 [qcom_pil_info]
q6v5_start+0xdf0/0x12e0 [qcom_q6v5_mss]
rproc_start+0x178/0x3a0
rproc_boot+0x5f0/0xb90
state_store+0x78/0x1bc
dev_attr_store+0x70/0x90
sysfs_kf_write+0xf4/0x118
kernfs_fop_write_iter+0x208/0x300
vfs_write+0x55c/0x804
ksys_pwrite64+0xc8/0x134
__arm64_compat_sys_aarch32_pwrite64+0xc4/0xdc
invoke_syscall+0x78/0x20c
el0_svc_common+0x11c/0x1f0
do_el0_svc_compat+0x50/0x60
el0_svc_compat+0x5c/0xec
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x1a4/0x1a8

The buggy address belongs to the variable:
.str.59+0x6/0xffffffffffffec80 [qcom_q6v5_mss]

Memory state around the buggy address:
ffffffd35086e280: 00 00 00 00 02 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
ffffffd35086e300: 00 02 f9 f9 f9 f9 f9 f9 00 00 00 06 f9 f9 f9 f9
>ffffffd35086e380: 06 f9 f9 f9 05 f9 f9 f9 00 00 00 00 00 06 f9 f9
^
ffffffd35086e400: f9 f9 f9 f9 01 f9 f9 f9 04 f9 f9 f9 00 00 01 f9
ffffffd35086e480: f9 f9 f9 f9 00 00 00 00 00 00 00 01 f9 f9 f9 f9

Fixes: 549b67da660d ("remoteproc: qcom: Introduce helper to store pil info in IMEM")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20211117065454.4142936-1-swboyd@chromium.org
diff fdc12231 Tue Nov 16 23:54:54 MST 2021 Stephen Boyd <swboyd@chromium.org> remoteproc: qcom: pil_info: Don't memcpy_toio more than is provided

If the string passed into qcom_pil_info_store() isn't as long as
PIL_RELOC_NAME_LEN we'll try to copy the string assuming the length is
PIL_RELOC_NAME_LEN to the io space and go beyond the bounds of the
string. Let's only copy as many byes as the string is long, ignoring the
NUL terminator.

This fixes the following KASAN error:

BUG: KASAN: global-out-of-bounds in __memcpy_toio+0x124/0x140
Read of size 1 at addr ffffffd35086e386 by task rmtfs/2392

CPU: 2 PID: 2392 Comm: rmtfs Tainted: G W 5.16.0-rc1-lockdep+ #10
Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
Call trace:
dump_backtrace+0x0/0x410
show_stack+0x24/0x30
dump_stack_lvl+0x7c/0xa0
print_address_description+0x78/0x2bc
kasan_report+0x160/0x1a0
__asan_report_load1_noabort+0x44/0x50
__memcpy_toio+0x124/0x140
qcom_pil_info_store+0x298/0x358 [qcom_pil_info]
q6v5_start+0xdf0/0x12e0 [qcom_q6v5_mss]
rproc_start+0x178/0x3a0
rproc_boot+0x5f0/0xb90
state_store+0x78/0x1bc
dev_attr_store+0x70/0x90
sysfs_kf_write+0xf4/0x118
kernfs_fop_write_iter+0x208/0x300
vfs_write+0x55c/0x804
ksys_pwrite64+0xc8/0x134
__arm64_compat_sys_aarch32_pwrite64+0xc4/0xdc
invoke_syscall+0x78/0x20c
el0_svc_common+0x11c/0x1f0
do_el0_svc_compat+0x50/0x60
el0_svc_compat+0x5c/0xec
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x1a4/0x1a8

The buggy address belongs to the variable:
.str.59+0x6/0xffffffffffffec80 [qcom_q6v5_mss]

Memory state around the buggy address:
ffffffd35086e280: 00 00 00 00 02 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
ffffffd35086e300: 00 02 f9 f9 f9 f9 f9 f9 00 00 00 06 f9 f9 f9 f9
>ffffffd35086e380: 06 f9 f9 f9 05 f9 f9 f9 00 00 00 00 00 06 f9 f9
^
ffffffd35086e400: f9 f9 f9 f9 01 f9 f9 f9 04 f9 f9 f9 00 00 01 f9
ffffffd35086e480: f9 f9 f9 f9 00 00 00 00 00 00 00 01 f9 f9 f9 f9

Fixes: 549b67da660d ("remoteproc: qcom: Introduce helper to store pil info in IMEM")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20211117065454.4142936-1-swboyd@chromium.org
diff fdc12231 Tue Nov 16 23:54:54 MST 2021 Stephen Boyd <swboyd@chromium.org> remoteproc: qcom: pil_info: Don't memcpy_toio more than is provided

If the string passed into qcom_pil_info_store() isn't as long as
PIL_RELOC_NAME_LEN we'll try to copy the string assuming the length is
PIL_RELOC_NAME_LEN to the io space and go beyond the bounds of the
string. Let's only copy as many byes as the string is long, ignoring the
NUL terminator.

This fixes the following KASAN error:

BUG: KASAN: global-out-of-bounds in __memcpy_toio+0x124/0x140
Read of size 1 at addr ffffffd35086e386 by task rmtfs/2392

CPU: 2 PID: 2392 Comm: rmtfs Tainted: G W 5.16.0-rc1-lockdep+ #10
Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
Call trace:
dump_backtrace+0x0/0x410
show_stack+0x24/0x30
dump_stack_lvl+0x7c/0xa0
print_address_description+0x78/0x2bc
kasan_report+0x160/0x1a0
__asan_report_load1_noabort+0x44/0x50
__memcpy_toio+0x124/0x140
qcom_pil_info_store+0x298/0x358 [qcom_pil_info]
q6v5_start+0xdf0/0x12e0 [qcom_q6v5_mss]
rproc_start+0x178/0x3a0
rproc_boot+0x5f0/0xb90
state_store+0x78/0x1bc
dev_attr_store+0x70/0x90
sysfs_kf_write+0xf4/0x118
kernfs_fop_write_iter+0x208/0x300
vfs_write+0x55c/0x804
ksys_pwrite64+0xc8/0x134
__arm64_compat_sys_aarch32_pwrite64+0xc4/0xdc
invoke_syscall+0x78/0x20c
el0_svc_common+0x11c/0x1f0
do_el0_svc_compat+0x50/0x60
el0_svc_compat+0x5c/0xec
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x1a4/0x1a8

The buggy address belongs to the variable:
.str.59+0x6/0xffffffffffffec80 [qcom_q6v5_mss]

Memory state around the buggy address:
ffffffd35086e280: 00 00 00 00 02 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
ffffffd35086e300: 00 02 f9 f9 f9 f9 f9 f9 00 00 00 06 f9 f9 f9 f9
>ffffffd35086e380: 06 f9 f9 f9 05 f9 f9 f9 00 00 00 00 00 06 f9 f9
^
ffffffd35086e400: f9 f9 f9 f9 01 f9 f9 f9 04 f9 f9 f9 00 00 01 f9
ffffffd35086e480: f9 f9 f9 f9 00 00 00 00 00 00 00 01 f9 f9 f9 f9

Fixes: 549b67da660d ("remoteproc: qcom: Introduce helper to store pil info in IMEM")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20211117065454.4142936-1-swboyd@chromium.org
diff fdc12231 Tue Nov 16 23:54:54 MST 2021 Stephen Boyd <swboyd@chromium.org> remoteproc: qcom: pil_info: Don't memcpy_toio more than is provided

If the string passed into qcom_pil_info_store() isn't as long as
PIL_RELOC_NAME_LEN we'll try to copy the string assuming the length is
PIL_RELOC_NAME_LEN to the io space and go beyond the bounds of the
string. Let's only copy as many byes as the string is long, ignoring the
NUL terminator.

This fixes the following KASAN error:

BUG: KASAN: global-out-of-bounds in __memcpy_toio+0x124/0x140
Read of size 1 at addr ffffffd35086e386 by task rmtfs/2392

CPU: 2 PID: 2392 Comm: rmtfs Tainted: G W 5.16.0-rc1-lockdep+ #10
Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
Call trace:
dump_backtrace+0x0/0x410
show_stack+0x24/0x30
dump_stack_lvl+0x7c/0xa0
print_address_description+0x78/0x2bc
kasan_report+0x160/0x1a0
__asan_report_load1_noabort+0x44/0x50
__memcpy_toio+0x124/0x140
qcom_pil_info_store+0x298/0x358 [qcom_pil_info]
q6v5_start+0xdf0/0x12e0 [qcom_q6v5_mss]
rproc_start+0x178/0x3a0
rproc_boot+0x5f0/0xb90
state_store+0x78/0x1bc
dev_attr_store+0x70/0x90
sysfs_kf_write+0xf4/0x118
kernfs_fop_write_iter+0x208/0x300
vfs_write+0x55c/0x804
ksys_pwrite64+0xc8/0x134
__arm64_compat_sys_aarch32_pwrite64+0xc4/0xdc
invoke_syscall+0x78/0x20c
el0_svc_common+0x11c/0x1f0
do_el0_svc_compat+0x50/0x60
el0_svc_compat+0x5c/0xec
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x1a4/0x1a8

The buggy address belongs to the variable:
.str.59+0x6/0xffffffffffffec80 [qcom_q6v5_mss]

Memory state around the buggy address:
ffffffd35086e280: 00 00 00 00 02 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
ffffffd35086e300: 00 02 f9 f9 f9 f9 f9 f9 00 00 00 06 f9 f9 f9 f9
>ffffffd35086e380: 06 f9 f9 f9 05 f9 f9 f9 00 00 00 00 00 06 f9 f9
^
ffffffd35086e400: f9 f9 f9 f9 01 f9 f9 f9 04 f9 f9 f9 00 00 01 f9
ffffffd35086e480: f9 f9 f9 f9 00 00 00 00 00 00 00 01 f9 f9 f9 f9

Fixes: 549b67da660d ("remoteproc: qcom: Introduce helper to store pil info in IMEM")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20211117065454.4142936-1-swboyd@chromium.org
diff fdc12231 Tue Nov 16 23:54:54 MST 2021 Stephen Boyd <swboyd@chromium.org> remoteproc: qcom: pil_info: Don't memcpy_toio more than is provided

If the string passed into qcom_pil_info_store() isn't as long as
PIL_RELOC_NAME_LEN we'll try to copy the string assuming the length is
PIL_RELOC_NAME_LEN to the io space and go beyond the bounds of the
string. Let's only copy as many byes as the string is long, ignoring the
NUL terminator.

This fixes the following KASAN error:

BUG: KASAN: global-out-of-bounds in __memcpy_toio+0x124/0x140
Read of size 1 at addr ffffffd35086e386 by task rmtfs/2392

CPU: 2 PID: 2392 Comm: rmtfs Tainted: G W 5.16.0-rc1-lockdep+ #10
Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
Call trace:
dump_backtrace+0x0/0x410
show_stack+0x24/0x30
dump_stack_lvl+0x7c/0xa0
print_address_description+0x78/0x2bc
kasan_report+0x160/0x1a0
__asan_report_load1_noabort+0x44/0x50
__memcpy_toio+0x124/0x140
qcom_pil_info_store+0x298/0x358 [qcom_pil_info]
q6v5_start+0xdf0/0x12e0 [qcom_q6v5_mss]
rproc_start+0x178/0x3a0
rproc_boot+0x5f0/0xb90
state_store+0x78/0x1bc
dev_attr_store+0x70/0x90
sysfs_kf_write+0xf4/0x118
kernfs_fop_write_iter+0x208/0x300
vfs_write+0x55c/0x804
ksys_pwrite64+0xc8/0x134
__arm64_compat_sys_aarch32_pwrite64+0xc4/0xdc
invoke_syscall+0x78/0x20c
el0_svc_common+0x11c/0x1f0
do_el0_svc_compat+0x50/0x60
el0_svc_compat+0x5c/0xec
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x1a4/0x1a8

The buggy address belongs to the variable:
.str.59+0x6/0xffffffffffffec80 [qcom_q6v5_mss]

Memory state around the buggy address:
ffffffd35086e280: 00 00 00 00 02 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
ffffffd35086e300: 00 02 f9 f9 f9 f9 f9 f9 00 00 00 06 f9 f9 f9 f9
>ffffffd35086e380: 06 f9 f9 f9 05 f9 f9 f9 00 00 00 00 00 06 f9 f9
^
ffffffd35086e400: f9 f9 f9 f9 01 f9 f9 f9 04 f9 f9 f9 00 00 01 f9
ffffffd35086e480: f9 f9 f9 f9 00 00 00 00 00 00 00 01 f9 f9 f9 f9

Fixes: 549b67da660d ("remoteproc: qcom: Introduce helper to store pil info in IMEM")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20211117065454.4142936-1-swboyd@chromium.org
diff fdc12231 Tue Nov 16 23:54:54 MST 2021 Stephen Boyd <swboyd@chromium.org> remoteproc: qcom: pil_info: Don't memcpy_toio more than is provided

If the string passed into qcom_pil_info_store() isn't as long as
PIL_RELOC_NAME_LEN we'll try to copy the string assuming the length is
PIL_RELOC_NAME_LEN to the io space and go beyond the bounds of the
string. Let's only copy as many byes as the string is long, ignoring the
NUL terminator.

This fixes the following KASAN error:

BUG: KASAN: global-out-of-bounds in __memcpy_toio+0x124/0x140
Read of size 1 at addr ffffffd35086e386 by task rmtfs/2392

CPU: 2 PID: 2392 Comm: rmtfs Tainted: G W 5.16.0-rc1-lockdep+ #10
Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
Call trace:
dump_backtrace+0x0/0x410
show_stack+0x24/0x30
dump_stack_lvl+0x7c/0xa0
print_address_description+0x78/0x2bc
kasan_report+0x160/0x1a0
__asan_report_load1_noabort+0x44/0x50
__memcpy_toio+0x124/0x140
qcom_pil_info_store+0x298/0x358 [qcom_pil_info]
q6v5_start+0xdf0/0x12e0 [qcom_q6v5_mss]
rproc_start+0x178/0x3a0
rproc_boot+0x5f0/0xb90
state_store+0x78/0x1bc
dev_attr_store+0x70/0x90
sysfs_kf_write+0xf4/0x118
kernfs_fop_write_iter+0x208/0x300
vfs_write+0x55c/0x804
ksys_pwrite64+0xc8/0x134
__arm64_compat_sys_aarch32_pwrite64+0xc4/0xdc
invoke_syscall+0x78/0x20c
el0_svc_common+0x11c/0x1f0
do_el0_svc_compat+0x50/0x60
el0_svc_compat+0x5c/0xec
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x1a4/0x1a8

The buggy address belongs to the variable:
.str.59+0x6/0xffffffffffffec80 [qcom_q6v5_mss]

Memory state around the buggy address:
ffffffd35086e280: 00 00 00 00 02 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
ffffffd35086e300: 00 02 f9 f9 f9 f9 f9 f9 00 00 00 06 f9 f9 f9 f9
>ffffffd35086e380: 06 f9 f9 f9 05 f9 f9 f9 00 00 00 00 00 06 f9 f9
^
ffffffd35086e400: f9 f9 f9 f9 01 f9 f9 f9 04 f9 f9 f9 00 00 01 f9
ffffffd35086e480: f9 f9 f9 f9 00 00 00 00 00 00 00 01 f9 f9 f9 f9

Fixes: 549b67da660d ("remoteproc: qcom: Introduce helper to store pil info in IMEM")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20211117065454.4142936-1-swboyd@chromium.org
diff fdc12231 Tue Nov 16 23:54:54 MST 2021 Stephen Boyd <swboyd@chromium.org> remoteproc: qcom: pil_info: Don't memcpy_toio more than is provided

If the string passed into qcom_pil_info_store() isn't as long as
PIL_RELOC_NAME_LEN we'll try to copy the string assuming the length is
PIL_RELOC_NAME_LEN to the io space and go beyond the bounds of the
string. Let's only copy as many byes as the string is long, ignoring the
NUL terminator.

This fixes the following KASAN error:

BUG: KASAN: global-out-of-bounds in __memcpy_toio+0x124/0x140
Read of size 1 at addr ffffffd35086e386 by task rmtfs/2392

CPU: 2 PID: 2392 Comm: rmtfs Tainted: G W 5.16.0-rc1-lockdep+ #10
Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
Call trace:
dump_backtrace+0x0/0x410
show_stack+0x24/0x30
dump_stack_lvl+0x7c/0xa0
print_address_description+0x78/0x2bc
kasan_report+0x160/0x1a0
__asan_report_load1_noabort+0x44/0x50
__memcpy_toio+0x124/0x140
qcom_pil_info_store+0x298/0x358 [qcom_pil_info]
q6v5_start+0xdf0/0x12e0 [qcom_q6v5_mss]
rproc_start+0x178/0x3a0
rproc_boot+0x5f0/0xb90
state_store+0x78/0x1bc
dev_attr_store+0x70/0x90
sysfs_kf_write+0xf4/0x118
kernfs_fop_write_iter+0x208/0x300
vfs_write+0x55c/0x804
ksys_pwrite64+0xc8/0x134
__arm64_compat_sys_aarch32_pwrite64+0xc4/0xdc
invoke_syscall+0x78/0x20c
el0_svc_common+0x11c/0x1f0
do_el0_svc_compat+0x50/0x60
el0_svc_compat+0x5c/0xec
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x1a4/0x1a8

The buggy address belongs to the variable:
.str.59+0x6/0xffffffffffffec80 [qcom_q6v5_mss]

Memory state around the buggy address:
ffffffd35086e280: 00 00 00 00 02 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
ffffffd35086e300: 00 02 f9 f9 f9 f9 f9 f9 00 00 00 06 f9 f9 f9 f9
>ffffffd35086e380: 06 f9 f9 f9 05 f9 f9 f9 00 00 00 00 00 06 f9 f9
^
ffffffd35086e400: f9 f9 f9 f9 01 f9 f9 f9 04 f9 f9 f9 00 00 01 f9
ffffffd35086e480: f9 f9 f9 f9 00 00 00 00 00 00 00 01 f9 f9 f9 f9

Fixes: 549b67da660d ("remoteproc: qcom: Introduce helper to store pil info in IMEM")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20211117065454.4142936-1-swboyd@chromium.org
diff fdc12231 Tue Nov 16 23:54:54 MST 2021 Stephen Boyd <swboyd@chromium.org> remoteproc: qcom: pil_info: Don't memcpy_toio more than is provided

If the string passed into qcom_pil_info_store() isn't as long as
PIL_RELOC_NAME_LEN we'll try to copy the string assuming the length is
PIL_RELOC_NAME_LEN to the io space and go beyond the bounds of the
string. Let's only copy as many byes as the string is long, ignoring the
NUL terminator.

This fixes the following KASAN error:

BUG: KASAN: global-out-of-bounds in __memcpy_toio+0x124/0x140
Read of size 1 at addr ffffffd35086e386 by task rmtfs/2392

CPU: 2 PID: 2392 Comm: rmtfs Tainted: G W 5.16.0-rc1-lockdep+ #10
Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
Call trace:
dump_backtrace+0x0/0x410
show_stack+0x24/0x30
dump_stack_lvl+0x7c/0xa0
print_address_description+0x78/0x2bc
kasan_report+0x160/0x1a0
__asan_report_load1_noabort+0x44/0x50
__memcpy_toio+0x124/0x140
qcom_pil_info_store+0x298/0x358 [qcom_pil_info]
q6v5_start+0xdf0/0x12e0 [qcom_q6v5_mss]
rproc_start+0x178/0x3a0
rproc_boot+0x5f0/0xb90
state_store+0x78/0x1bc
dev_attr_store+0x70/0x90
sysfs_kf_write+0xf4/0x118
kernfs_fop_write_iter+0x208/0x300
vfs_write+0x55c/0x804
ksys_pwrite64+0xc8/0x134
__arm64_compat_sys_aarch32_pwrite64+0xc4/0xdc
invoke_syscall+0x78/0x20c
el0_svc_common+0x11c/0x1f0
do_el0_svc_compat+0x50/0x60
el0_svc_compat+0x5c/0xec
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x1a4/0x1a8

The buggy address belongs to the variable:
.str.59+0x6/0xffffffffffffec80 [qcom_q6v5_mss]

Memory state around the buggy address:
ffffffd35086e280: 00 00 00 00 02 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
ffffffd35086e300: 00 02 f9 f9 f9 f9 f9 f9 00 00 00 06 f9 f9 f9 f9
>ffffffd35086e380: 06 f9 f9 f9 05 f9 f9 f9 00 00 00 00 00 06 f9 f9
^
ffffffd35086e400: f9 f9 f9 f9 01 f9 f9 f9 04 f9 f9 f9 00 00 01 f9
ffffffd35086e480: f9 f9 f9 f9 00 00 00 00 00 00 00 01 f9 f9 f9 f9

Fixes: 549b67da660d ("remoteproc: qcom: Introduce helper to store pil info in IMEM")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20211117065454.4142936-1-swboyd@chromium.org
diff fdc12231 Tue Nov 16 23:54:54 MST 2021 Stephen Boyd <swboyd@chromium.org> remoteproc: qcom: pil_info: Don't memcpy_toio more than is provided

If the string passed into qcom_pil_info_store() isn't as long as
PIL_RELOC_NAME_LEN we'll try to copy the string assuming the length is
PIL_RELOC_NAME_LEN to the io space and go beyond the bounds of the
string. Let's only copy as many byes as the string is long, ignoring the
NUL terminator.

This fixes the following KASAN error:

BUG: KASAN: global-out-of-bounds in __memcpy_toio+0x124/0x140
Read of size 1 at addr ffffffd35086e386 by task rmtfs/2392

CPU: 2 PID: 2392 Comm: rmtfs Tainted: G W 5.16.0-rc1-lockdep+ #10
Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
Call trace:
dump_backtrace+0x0/0x410
show_stack+0x24/0x30
dump_stack_lvl+0x7c/0xa0
print_address_description+0x78/0x2bc
kasan_report+0x160/0x1a0
__asan_report_load1_noabort+0x44/0x50
__memcpy_toio+0x124/0x140
qcom_pil_info_store+0x298/0x358 [qcom_pil_info]
q6v5_start+0xdf0/0x12e0 [qcom_q6v5_mss]
rproc_start+0x178/0x3a0
rproc_boot+0x5f0/0xb90
state_store+0x78/0x1bc
dev_attr_store+0x70/0x90
sysfs_kf_write+0xf4/0x118
kernfs_fop_write_iter+0x208/0x300
vfs_write+0x55c/0x804
ksys_pwrite64+0xc8/0x134
__arm64_compat_sys_aarch32_pwrite64+0xc4/0xdc
invoke_syscall+0x78/0x20c
el0_svc_common+0x11c/0x1f0
do_el0_svc_compat+0x50/0x60
el0_svc_compat+0x5c/0xec
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x1a4/0x1a8

The buggy address belongs to the variable:
.str.59+0x6/0xffffffffffffec80 [qcom_q6v5_mss]

Memory state around the buggy address:
ffffffd35086e280: 00 00 00 00 02 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
ffffffd35086e300: 00 02 f9 f9 f9 f9 f9 f9 00 00 00 06 f9 f9 f9 f9
>ffffffd35086e380: 06 f9 f9 f9 05 f9 f9 f9 00 00 00 00 00 06 f9 f9
^
ffffffd35086e400: f9 f9 f9 f9 01 f9 f9 f9 04 f9 f9 f9 00 00 01 f9
ffffffd35086e480: f9 f9 f9 f9 00 00 00 00 00 00 00 01 f9 f9 f9 f9

Fixes: 549b67da660d ("remoteproc: qcom: Introduce helper to store pil info in IMEM")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20211117065454.4142936-1-swboyd@chromium.org
diff fdc12231 Tue Nov 16 23:54:54 MST 2021 Stephen Boyd <swboyd@chromium.org> remoteproc: qcom: pil_info: Don't memcpy_toio more than is provided

If the string passed into qcom_pil_info_store() isn't as long as
PIL_RELOC_NAME_LEN we'll try to copy the string assuming the length is
PIL_RELOC_NAME_LEN to the io space and go beyond the bounds of the
string. Let's only copy as many byes as the string is long, ignoring the
NUL terminator.

This fixes the following KASAN error:

BUG: KASAN: global-out-of-bounds in __memcpy_toio+0x124/0x140
Read of size 1 at addr ffffffd35086e386 by task rmtfs/2392

CPU: 2 PID: 2392 Comm: rmtfs Tainted: G W 5.16.0-rc1-lockdep+ #10
Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
Call trace:
dump_backtrace+0x0/0x410
show_stack+0x24/0x30
dump_stack_lvl+0x7c/0xa0
print_address_description+0x78/0x2bc
kasan_report+0x160/0x1a0
__asan_report_load1_noabort+0x44/0x50
__memcpy_toio+0x124/0x140
qcom_pil_info_store+0x298/0x358 [qcom_pil_info]
q6v5_start+0xdf0/0x12e0 [qcom_q6v5_mss]
rproc_start+0x178/0x3a0
rproc_boot+0x5f0/0xb90
state_store+0x78/0x1bc
dev_attr_store+0x70/0x90
sysfs_kf_write+0xf4/0x118
kernfs_fop_write_iter+0x208/0x300
vfs_write+0x55c/0x804
ksys_pwrite64+0xc8/0x134
__arm64_compat_sys_aarch32_pwrite64+0xc4/0xdc
invoke_syscall+0x78/0x20c
el0_svc_common+0x11c/0x1f0
do_el0_svc_compat+0x50/0x60
el0_svc_compat+0x5c/0xec
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x1a4/0x1a8

The buggy address belongs to the variable:
.str.59+0x6/0xffffffffffffec80 [qcom_q6v5_mss]

Memory state around the buggy address:
ffffffd35086e280: 00 00 00 00 02 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
ffffffd35086e300: 00 02 f9 f9 f9 f9 f9 f9 00 00 00 06 f9 f9 f9 f9
>ffffffd35086e380: 06 f9 f9 f9 05 f9 f9 f9 00 00 00 00 00 06 f9 f9
^
ffffffd35086e400: f9 f9 f9 f9 01 f9 f9 f9 04 f9 f9 f9 00 00 01 f9
ffffffd35086e480: f9 f9 f9 f9 00 00 00 00 00 00 00 01 f9 f9 f9 f9

Fixes: 549b67da660d ("remoteproc: qcom: Introduce helper to store pil info in IMEM")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20211117065454.4142936-1-swboyd@chromium.org
diff fdc12231 Tue Nov 16 23:54:54 MST 2021 Stephen Boyd <swboyd@chromium.org> remoteproc: qcom: pil_info: Don't memcpy_toio more than is provided

If the string passed into qcom_pil_info_store() isn't as long as
PIL_RELOC_NAME_LEN we'll try to copy the string assuming the length is
PIL_RELOC_NAME_LEN to the io space and go beyond the bounds of the
string. Let's only copy as many byes as the string is long, ignoring the
NUL terminator.

This fixes the following KASAN error:

BUG: KASAN: global-out-of-bounds in __memcpy_toio+0x124/0x140
Read of size 1 at addr ffffffd35086e386 by task rmtfs/2392

CPU: 2 PID: 2392 Comm: rmtfs Tainted: G W 5.16.0-rc1-lockdep+ #10
Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
Call trace:
dump_backtrace+0x0/0x410
show_stack+0x24/0x30
dump_stack_lvl+0x7c/0xa0
print_address_description+0x78/0x2bc
kasan_report+0x160/0x1a0
__asan_report_load1_noabort+0x44/0x50
__memcpy_toio+0x124/0x140
qcom_pil_info_store+0x298/0x358 [qcom_pil_info]
q6v5_start+0xdf0/0x12e0 [qcom_q6v5_mss]
rproc_start+0x178/0x3a0
rproc_boot+0x5f0/0xb90
state_store+0x78/0x1bc
dev_attr_store+0x70/0x90
sysfs_kf_write+0xf4/0x118
kernfs_fop_write_iter+0x208/0x300
vfs_write+0x55c/0x804
ksys_pwrite64+0xc8/0x134
__arm64_compat_sys_aarch32_pwrite64+0xc4/0xdc
invoke_syscall+0x78/0x20c
el0_svc_common+0x11c/0x1f0
do_el0_svc_compat+0x50/0x60
el0_svc_compat+0x5c/0xec
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x1a4/0x1a8

The buggy address belongs to the variable:
.str.59+0x6/0xffffffffffffec80 [qcom_q6v5_mss]

Memory state around the buggy address:
ffffffd35086e280: 00 00 00 00 02 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
ffffffd35086e300: 00 02 f9 f9 f9 f9 f9 f9 00 00 00 06 f9 f9 f9 f9
>ffffffd35086e380: 06 f9 f9 f9 05 f9 f9 f9 00 00 00 00 00 06 f9 f9
^
ffffffd35086e400: f9 f9 f9 f9 01 f9 f9 f9 04 f9 f9 f9 00 00 01 f9
ffffffd35086e480: f9 f9 f9 f9 00 00 00 00 00 00 00 01 f9 f9 f9 f9

Fixes: 549b67da660d ("remoteproc: qcom: Introduce helper to store pil info in IMEM")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20211117065454.4142936-1-swboyd@chromium.org
diff fdc12231 Tue Nov 16 23:54:54 MST 2021 Stephen Boyd <swboyd@chromium.org> remoteproc: qcom: pil_info: Don't memcpy_toio more than is provided

If the string passed into qcom_pil_info_store() isn't as long as
PIL_RELOC_NAME_LEN we'll try to copy the string assuming the length is
PIL_RELOC_NAME_LEN to the io space and go beyond the bounds of the
string. Let's only copy as many byes as the string is long, ignoring the
NUL terminator.

This fixes the following KASAN error:

BUG: KASAN: global-out-of-bounds in __memcpy_toio+0x124/0x140
Read of size 1 at addr ffffffd35086e386 by task rmtfs/2392

CPU: 2 PID: 2392 Comm: rmtfs Tainted: G W 5.16.0-rc1-lockdep+ #10
Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
Call trace:
dump_backtrace+0x0/0x410
show_stack+0x24/0x30
dump_stack_lvl+0x7c/0xa0
print_address_description+0x78/0x2bc
kasan_report+0x160/0x1a0
__asan_report_load1_noabort+0x44/0x50
__memcpy_toio+0x124/0x140
qcom_pil_info_store+0x298/0x358 [qcom_pil_info]
q6v5_start+0xdf0/0x12e0 [qcom_q6v5_mss]
rproc_start+0x178/0x3a0
rproc_boot+0x5f0/0xb90
state_store+0x78/0x1bc
dev_attr_store+0x70/0x90
sysfs_kf_write+0xf4/0x118
kernfs_fop_write_iter+0x208/0x300
vfs_write+0x55c/0x804
ksys_pwrite64+0xc8/0x134
__arm64_compat_sys_aarch32_pwrite64+0xc4/0xdc
invoke_syscall+0x78/0x20c
el0_svc_common+0x11c/0x1f0
do_el0_svc_compat+0x50/0x60
el0_svc_compat+0x5c/0xec
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x1a4/0x1a8

The buggy address belongs to the variable:
.str.59+0x6/0xffffffffffffec80 [qcom_q6v5_mss]

Memory state around the buggy address:
ffffffd35086e280: 00 00 00 00 02 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
ffffffd35086e300: 00 02 f9 f9 f9 f9 f9 f9 00 00 00 06 f9 f9 f9 f9
>ffffffd35086e380: 06 f9 f9 f9 05 f9 f9 f9 00 00 00 00 00 06 f9 f9
^
ffffffd35086e400: f9 f9 f9 f9 01 f9 f9 f9 04 f9 f9 f9 00 00 01 f9
ffffffd35086e480: f9 f9 f9 f9 00 00 00 00 00 00 00 01 f9 f9 f9 f9

Fixes: 549b67da660d ("remoteproc: qcom: Introduce helper to store pil info in IMEM")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20211117065454.4142936-1-swboyd@chromium.org
diff fdc12231 Tue Nov 16 23:54:54 MST 2021 Stephen Boyd <swboyd@chromium.org> remoteproc: qcom: pil_info: Don't memcpy_toio more than is provided

If the string passed into qcom_pil_info_store() isn't as long as
PIL_RELOC_NAME_LEN we'll try to copy the string assuming the length is
PIL_RELOC_NAME_LEN to the io space and go beyond the bounds of the
string. Let's only copy as many byes as the string is long, ignoring the
NUL terminator.

This fixes the following KASAN error:

BUG: KASAN: global-out-of-bounds in __memcpy_toio+0x124/0x140
Read of size 1 at addr ffffffd35086e386 by task rmtfs/2392

CPU: 2 PID: 2392 Comm: rmtfs Tainted: G W 5.16.0-rc1-lockdep+ #10
Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
Call trace:
dump_backtrace+0x0/0x410
show_stack+0x24/0x30
dump_stack_lvl+0x7c/0xa0
print_address_description+0x78/0x2bc
kasan_report+0x160/0x1a0
__asan_report_load1_noabort+0x44/0x50
__memcpy_toio+0x124/0x140
qcom_pil_info_store+0x298/0x358 [qcom_pil_info]
q6v5_start+0xdf0/0x12e0 [qcom_q6v5_mss]
rproc_start+0x178/0x3a0
rproc_boot+0x5f0/0xb90
state_store+0x78/0x1bc
dev_attr_store+0x70/0x90
sysfs_kf_write+0xf4/0x118
kernfs_fop_write_iter+0x208/0x300
vfs_write+0x55c/0x804
ksys_pwrite64+0xc8/0x134
__arm64_compat_sys_aarch32_pwrite64+0xc4/0xdc
invoke_syscall+0x78/0x20c
el0_svc_common+0x11c/0x1f0
do_el0_svc_compat+0x50/0x60
el0_svc_compat+0x5c/0xec
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x1a4/0x1a8

The buggy address belongs to the variable:
.str.59+0x6/0xffffffffffffec80 [qcom_q6v5_mss]

Memory state around the buggy address:
ffffffd35086e280: 00 00 00 00 02 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
ffffffd35086e300: 00 02 f9 f9 f9 f9 f9 f9 00 00 00 06 f9 f9 f9 f9
>ffffffd35086e380: 06 f9 f9 f9 05 f9 f9 f9 00 00 00 00 00 06 f9 f9
^
ffffffd35086e400: f9 f9 f9 f9 01 f9 f9 f9 04 f9 f9 f9 00 00 01 f9
ffffffd35086e480: f9 f9 f9 f9 00 00 00 00 00 00 00 01 f9 f9 f9 f9

Fixes: 549b67da660d ("remoteproc: qcom: Introduce helper to store pil info in IMEM")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20211117065454.4142936-1-swboyd@chromium.org
diff fdc12231 Tue Nov 16 23:54:54 MST 2021 Stephen Boyd <swboyd@chromium.org> remoteproc: qcom: pil_info: Don't memcpy_toio more than is provided

If the string passed into qcom_pil_info_store() isn't as long as
PIL_RELOC_NAME_LEN we'll try to copy the string assuming the length is
PIL_RELOC_NAME_LEN to the io space and go beyond the bounds of the
string. Let's only copy as many byes as the string is long, ignoring the
NUL terminator.

This fixes the following KASAN error:

BUG: KASAN: global-out-of-bounds in __memcpy_toio+0x124/0x140
Read of size 1 at addr ffffffd35086e386 by task rmtfs/2392

CPU: 2 PID: 2392 Comm: rmtfs Tainted: G W 5.16.0-rc1-lockdep+ #10
Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
Call trace:
dump_backtrace+0x0/0x410
show_stack+0x24/0x30
dump_stack_lvl+0x7c/0xa0
print_address_description+0x78/0x2bc
kasan_report+0x160/0x1a0
__asan_report_load1_noabort+0x44/0x50
__memcpy_toio+0x124/0x140
qcom_pil_info_store+0x298/0x358 [qcom_pil_info]
q6v5_start+0xdf0/0x12e0 [qcom_q6v5_mss]
rproc_start+0x178/0x3a0
rproc_boot+0x5f0/0xb90
state_store+0x78/0x1bc
dev_attr_store+0x70/0x90
sysfs_kf_write+0xf4/0x118
kernfs_fop_write_iter+0x208/0x300
vfs_write+0x55c/0x804
ksys_pwrite64+0xc8/0x134
__arm64_compat_sys_aarch32_pwrite64+0xc4/0xdc
invoke_syscall+0x78/0x20c
el0_svc_common+0x11c/0x1f0
do_el0_svc_compat+0x50/0x60
el0_svc_compat+0x5c/0xec
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x1a4/0x1a8

The buggy address belongs to the variable:
.str.59+0x6/0xffffffffffffec80 [qcom_q6v5_mss]

Memory state around the buggy address:
ffffffd35086e280: 00 00 00 00 02 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
ffffffd35086e300: 00 02 f9 f9 f9 f9 f9 f9 00 00 00 06 f9 f9 f9 f9
>ffffffd35086e380: 06 f9 f9 f9 05 f9 f9 f9 00 00 00 00 00 06 f9 f9
^
ffffffd35086e400: f9 f9 f9 f9 01 f9 f9 f9 04 f9 f9 f9 00 00 01 f9
ffffffd35086e480: f9 f9 f9 f9 00 00 00 00 00 00 00 01 f9 f9 f9 f9

Fixes: 549b67da660d ("remoteproc: qcom: Introduce helper to store pil info in IMEM")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20211117065454.4142936-1-swboyd@chromium.org
diff fdc12231 Tue Nov 16 23:54:54 MST 2021 Stephen Boyd <swboyd@chromium.org> remoteproc: qcom: pil_info: Don't memcpy_toio more than is provided

If the string passed into qcom_pil_info_store() isn't as long as
PIL_RELOC_NAME_LEN we'll try to copy the string assuming the length is
PIL_RELOC_NAME_LEN to the io space and go beyond the bounds of the
string. Let's only copy as many byes as the string is long, ignoring the
NUL terminator.

This fixes the following KASAN error:

BUG: KASAN: global-out-of-bounds in __memcpy_toio+0x124/0x140
Read of size 1 at addr ffffffd35086e386 by task rmtfs/2392

CPU: 2 PID: 2392 Comm: rmtfs Tainted: G W 5.16.0-rc1-lockdep+ #10
Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
Call trace:
dump_backtrace+0x0/0x410
show_stack+0x24/0x30
dump_stack_lvl+0x7c/0xa0
print_address_description+0x78/0x2bc
kasan_report+0x160/0x1a0
__asan_report_load1_noabort+0x44/0x50
__memcpy_toio+0x124/0x140
qcom_pil_info_store+0x298/0x358 [qcom_pil_info]
q6v5_start+0xdf0/0x12e0 [qcom_q6v5_mss]
rproc_start+0x178/0x3a0
rproc_boot+0x5f0/0xb90
state_store+0x78/0x1bc
dev_attr_store+0x70/0x90
sysfs_kf_write+0xf4/0x118
kernfs_fop_write_iter+0x208/0x300
vfs_write+0x55c/0x804
ksys_pwrite64+0xc8/0x134
__arm64_compat_sys_aarch32_pwrite64+0xc4/0xdc
invoke_syscall+0x78/0x20c
el0_svc_common+0x11c/0x1f0
do_el0_svc_compat+0x50/0x60
el0_svc_compat+0x5c/0xec
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x1a4/0x1a8

The buggy address belongs to the variable:
.str.59+0x6/0xffffffffffffec80 [qcom_q6v5_mss]

Memory state around the buggy address:
ffffffd35086e280: 00 00 00 00 02 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
ffffffd35086e300: 00 02 f9 f9 f9 f9 f9 f9 00 00 00 06 f9 f9 f9 f9
>ffffffd35086e380: 06 f9 f9 f9 05 f9 f9 f9 00 00 00 00 00 06 f9 f9
^
ffffffd35086e400: f9 f9 f9 f9 01 f9 f9 f9 04 f9 f9 f9 00 00 01 f9
ffffffd35086e480: f9 f9 f9 f9 00 00 00 00 00 00 00 01 f9 f9 f9 f9

Fixes: 549b67da660d ("remoteproc: qcom: Introduce helper to store pil info in IMEM")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20211117065454.4142936-1-swboyd@chromium.org
diff fdc12231 Tue Nov 16 23:54:54 MST 2021 Stephen Boyd <swboyd@chromium.org> remoteproc: qcom: pil_info: Don't memcpy_toio more than is provided

If the string passed into qcom_pil_info_store() isn't as long as
PIL_RELOC_NAME_LEN we'll try to copy the string assuming the length is
PIL_RELOC_NAME_LEN to the io space and go beyond the bounds of the
string. Let's only copy as many byes as the string is long, ignoring the
NUL terminator.

This fixes the following KASAN error:

BUG: KASAN: global-out-of-bounds in __memcpy_toio+0x124/0x140
Read of size 1 at addr ffffffd35086e386 by task rmtfs/2392

CPU: 2 PID: 2392 Comm: rmtfs Tainted: G W 5.16.0-rc1-lockdep+ #10
Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
Call trace:
dump_backtrace+0x0/0x410
show_stack+0x24/0x30
dump_stack_lvl+0x7c/0xa0
print_address_description+0x78/0x2bc
kasan_report+0x160/0x1a0
__asan_report_load1_noabort+0x44/0x50
__memcpy_toio+0x124/0x140
qcom_pil_info_store+0x298/0x358 [qcom_pil_info]
q6v5_start+0xdf0/0x12e0 [qcom_q6v5_mss]
rproc_start+0x178/0x3a0
rproc_boot+0x5f0/0xb90
state_store+0x78/0x1bc
dev_attr_store+0x70/0x90
sysfs_kf_write+0xf4/0x118
kernfs_fop_write_iter+0x208/0x300
vfs_write+0x55c/0x804
ksys_pwrite64+0xc8/0x134
__arm64_compat_sys_aarch32_pwrite64+0xc4/0xdc
invoke_syscall+0x78/0x20c
el0_svc_common+0x11c/0x1f0
do_el0_svc_compat+0x50/0x60
el0_svc_compat+0x5c/0xec
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x1a4/0x1a8

The buggy address belongs to the variable:
.str.59+0x6/0xffffffffffffec80 [qcom_q6v5_mss]

Memory state around the buggy address:
ffffffd35086e280: 00 00 00 00 02 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
ffffffd35086e300: 00 02 f9 f9 f9 f9 f9 f9 00 00 00 06 f9 f9 f9 f9
>ffffffd35086e380: 06 f9 f9 f9 05 f9 f9 f9 00 00 00 00 00 06 f9 f9
^
ffffffd35086e400: f9 f9 f9 f9 01 f9 f9 f9 04 f9 f9 f9 00 00 01 f9
ffffffd35086e480: f9 f9 f9 f9 00 00 00 00 00 00 00 01 f9 f9 f9 f9

Fixes: 549b67da660d ("remoteproc: qcom: Introduce helper to store pil info in IMEM")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20211117065454.4142936-1-swboyd@chromium.org
diff fdc12231 Tue Nov 16 23:54:54 MST 2021 Stephen Boyd <swboyd@chromium.org> remoteproc: qcom: pil_info: Don't memcpy_toio more than is provided

If the string passed into qcom_pil_info_store() isn't as long as
PIL_RELOC_NAME_LEN we'll try to copy the string assuming the length is
PIL_RELOC_NAME_LEN to the io space and go beyond the bounds of the
string. Let's only copy as many byes as the string is long, ignoring the
NUL terminator.

This fixes the following KASAN error:

BUG: KASAN: global-out-of-bounds in __memcpy_toio+0x124/0x140
Read of size 1 at addr ffffffd35086e386 by task rmtfs/2392

CPU: 2 PID: 2392 Comm: rmtfs Tainted: G W 5.16.0-rc1-lockdep+ #10
Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
Call trace:
dump_backtrace+0x0/0x410
show_stack+0x24/0x30
dump_stack_lvl+0x7c/0xa0
print_address_description+0x78/0x2bc
kasan_report+0x160/0x1a0
__asan_report_load1_noabort+0x44/0x50
__memcpy_toio+0x124/0x140
qcom_pil_info_store+0x298/0x358 [qcom_pil_info]
q6v5_start+0xdf0/0x12e0 [qcom_q6v5_mss]
rproc_start+0x178/0x3a0
rproc_boot+0x5f0/0xb90
state_store+0x78/0x1bc
dev_attr_store+0x70/0x90
sysfs_kf_write+0xf4/0x118
kernfs_fop_write_iter+0x208/0x300
vfs_write+0x55c/0x804
ksys_pwrite64+0xc8/0x134
__arm64_compat_sys_aarch32_pwrite64+0xc4/0xdc
invoke_syscall+0x78/0x20c
el0_svc_common+0x11c/0x1f0
do_el0_svc_compat+0x50/0x60
el0_svc_compat+0x5c/0xec
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x1a4/0x1a8

The buggy address belongs to the variable:
.str.59+0x6/0xffffffffffffec80 [qcom_q6v5_mss]

Memory state around the buggy address:
ffffffd35086e280: 00 00 00 00 02 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
ffffffd35086e300: 00 02 f9 f9 f9 f9 f9 f9 00 00 00 06 f9 f9 f9 f9
>ffffffd35086e380: 06 f9 f9 f9 05 f9 f9 f9 00 00 00 00 00 06 f9 f9
^
ffffffd35086e400: f9 f9 f9 f9 01 f9 f9 f9 04 f9 f9 f9 00 00 01 f9
ffffffd35086e480: f9 f9 f9 f9 00 00 00 00 00 00 00 01 f9 f9 f9 f9

Fixes: 549b67da660d ("remoteproc: qcom: Introduce helper to store pil info in IMEM")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20211117065454.4142936-1-swboyd@chromium.org
diff fdc12231 Tue Nov 16 23:54:54 MST 2021 Stephen Boyd <swboyd@chromium.org> remoteproc: qcom: pil_info: Don't memcpy_toio more than is provided

If the string passed into qcom_pil_info_store() isn't as long as
PIL_RELOC_NAME_LEN we'll try to copy the string assuming the length is
PIL_RELOC_NAME_LEN to the io space and go beyond the bounds of the
string. Let's only copy as many byes as the string is long, ignoring the
NUL terminator.

This fixes the following KASAN error:

BUG: KASAN: global-out-of-bounds in __memcpy_toio+0x124/0x140
Read of size 1 at addr ffffffd35086e386 by task rmtfs/2392

CPU: 2 PID: 2392 Comm: rmtfs Tainted: G W 5.16.0-rc1-lockdep+ #10
Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
Call trace:
dump_backtrace+0x0/0x410
show_stack+0x24/0x30
dump_stack_lvl+0x7c/0xa0
print_address_description+0x78/0x2bc
kasan_report+0x160/0x1a0
__asan_report_load1_noabort+0x44/0x50
__memcpy_toio+0x124/0x140
qcom_pil_info_store+0x298/0x358 [qcom_pil_info]
q6v5_start+0xdf0/0x12e0 [qcom_q6v5_mss]
rproc_start+0x178/0x3a0
rproc_boot+0x5f0/0xb90
state_store+0x78/0x1bc
dev_attr_store+0x70/0x90
sysfs_kf_write+0xf4/0x118
kernfs_fop_write_iter+0x208/0x300
vfs_write+0x55c/0x804
ksys_pwrite64+0xc8/0x134
__arm64_compat_sys_aarch32_pwrite64+0xc4/0xdc
invoke_syscall+0x78/0x20c
el0_svc_common+0x11c/0x1f0
do_el0_svc_compat+0x50/0x60
el0_svc_compat+0x5c/0xec
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x1a4/0x1a8

The buggy address belongs to the variable:
.str.59+0x6/0xffffffffffffec80 [qcom_q6v5_mss]

Memory state around the buggy address:
ffffffd35086e280: 00 00 00 00 02 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
ffffffd35086e300: 00 02 f9 f9 f9 f9 f9 f9 00 00 00 06 f9 f9 f9 f9
>ffffffd35086e380: 06 f9 f9 f9 05 f9 f9 f9 00 00 00 00 00 06 f9 f9
^
ffffffd35086e400: f9 f9 f9 f9 01 f9 f9 f9 04 f9 f9 f9 00 00 01 f9
ffffffd35086e480: f9 f9 f9 f9 00 00 00 00 00 00 00 01 f9 f9 f9 f9

Fixes: 549b67da660d ("remoteproc: qcom: Introduce helper to store pil info in IMEM")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20211117065454.4142936-1-swboyd@chromium.org
diff fdc12231 Tue Nov 16 23:54:54 MST 2021 Stephen Boyd <swboyd@chromium.org> remoteproc: qcom: pil_info: Don't memcpy_toio more than is provided

If the string passed into qcom_pil_info_store() isn't as long as
PIL_RELOC_NAME_LEN we'll try to copy the string assuming the length is
PIL_RELOC_NAME_LEN to the io space and go beyond the bounds of the
string. Let's only copy as many byes as the string is long, ignoring the
NUL terminator.

This fixes the following KASAN error:

BUG: KASAN: global-out-of-bounds in __memcpy_toio+0x124/0x140
Read of size 1 at addr ffffffd35086e386 by task rmtfs/2392

CPU: 2 PID: 2392 Comm: rmtfs Tainted: G W 5.16.0-rc1-lockdep+ #10
Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
Call trace:
dump_backtrace+0x0/0x410
show_stack+0x24/0x30
dump_stack_lvl+0x7c/0xa0
print_address_description+0x78/0x2bc
kasan_report+0x160/0x1a0
__asan_report_load1_noabort+0x44/0x50
__memcpy_toio+0x124/0x140
qcom_pil_info_store+0x298/0x358 [qcom_pil_info]
q6v5_start+0xdf0/0x12e0 [qcom_q6v5_mss]
rproc_start+0x178/0x3a0
rproc_boot+0x5f0/0xb90
state_store+0x78/0x1bc
dev_attr_store+0x70/0x90
sysfs_kf_write+0xf4/0x118
kernfs_fop_write_iter+0x208/0x300
vfs_write+0x55c/0x804
ksys_pwrite64+0xc8/0x134
__arm64_compat_sys_aarch32_pwrite64+0xc4/0xdc
invoke_syscall+0x78/0x20c
el0_svc_common+0x11c/0x1f0
do_el0_svc_compat+0x50/0x60
el0_svc_compat+0x5c/0xec
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x1a4/0x1a8

The buggy address belongs to the variable:
.str.59+0x6/0xffffffffffffec80 [qcom_q6v5_mss]

Memory state around the buggy address:
ffffffd35086e280: 00 00 00 00 02 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
ffffffd35086e300: 00 02 f9 f9 f9 f9 f9 f9 00 00 00 06 f9 f9 f9 f9
>ffffffd35086e380: 06 f9 f9 f9 05 f9 f9 f9 00 00 00 00 00 06 f9 f9
^
ffffffd35086e400: f9 f9 f9 f9 01 f9 f9 f9 04 f9 f9 f9 00 00 01 f9
ffffffd35086e480: f9 f9 f9 f9 00 00 00 00 00 00 00 01 f9 f9 f9 f9

Fixes: 549b67da660d ("remoteproc: qcom: Introduce helper to store pil info in IMEM")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20211117065454.4142936-1-swboyd@chromium.org
diff fdc12231 Tue Nov 16 23:54:54 MST 2021 Stephen Boyd <swboyd@chromium.org> remoteproc: qcom: pil_info: Don't memcpy_toio more than is provided

If the string passed into qcom_pil_info_store() isn't as long as
PIL_RELOC_NAME_LEN we'll try to copy the string assuming the length is
PIL_RELOC_NAME_LEN to the io space and go beyond the bounds of the
string. Let's only copy as many byes as the string is long, ignoring the
NUL terminator.

This fixes the following KASAN error:

BUG: KASAN: global-out-of-bounds in __memcpy_toio+0x124/0x140
Read of size 1 at addr ffffffd35086e386 by task rmtfs/2392

CPU: 2 PID: 2392 Comm: rmtfs Tainted: G W 5.16.0-rc1-lockdep+ #10
Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
Call trace:
dump_backtrace+0x0/0x410
show_stack+0x24/0x30
dump_stack_lvl+0x7c/0xa0
print_address_description+0x78/0x2bc
kasan_report+0x160/0x1a0
__asan_report_load1_noabort+0x44/0x50
__memcpy_toio+0x124/0x140
qcom_pil_info_store+0x298/0x358 [qcom_pil_info]
q6v5_start+0xdf0/0x12e0 [qcom_q6v5_mss]
rproc_start+0x178/0x3a0
rproc_boot+0x5f0/0xb90
state_store+0x78/0x1bc
dev_attr_store+0x70/0x90
sysfs_kf_write+0xf4/0x118
kernfs_fop_write_iter+0x208/0x300
vfs_write+0x55c/0x804
ksys_pwrite64+0xc8/0x134
__arm64_compat_sys_aarch32_pwrite64+0xc4/0xdc
invoke_syscall+0x78/0x20c
el0_svc_common+0x11c/0x1f0
do_el0_svc_compat+0x50/0x60
el0_svc_compat+0x5c/0xec
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x1a4/0x1a8

The buggy address belongs to the variable:
.str.59+0x6/0xffffffffffffec80 [qcom_q6v5_mss]

Memory state around the buggy address:
ffffffd35086e280: 00 00 00 00 02 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
ffffffd35086e300: 00 02 f9 f9 f9 f9 f9 f9 00 00 00 06 f9 f9 f9 f9
>ffffffd35086e380: 06 f9 f9 f9 05 f9 f9 f9 00 00 00 00 00 06 f9 f9
^
ffffffd35086e400: f9 f9 f9 f9 01 f9 f9 f9 04 f9 f9 f9 00 00 01 f9
ffffffd35086e480: f9 f9 f9 f9 00 00 00 00 00 00 00 01 f9 f9 f9 f9

Fixes: 549b67da660d ("remoteproc: qcom: Introduce helper to store pil info in IMEM")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20211117065454.4142936-1-swboyd@chromium.org
diff fdc12231 Tue Nov 16 23:54:54 MST 2021 Stephen Boyd <swboyd@chromium.org> remoteproc: qcom: pil_info: Don't memcpy_toio more than is provided

If the string passed into qcom_pil_info_store() isn't as long as
PIL_RELOC_NAME_LEN we'll try to copy the string assuming the length is
PIL_RELOC_NAME_LEN to the io space and go beyond the bounds of the
string. Let's only copy as many byes as the string is long, ignoring the
NUL terminator.

This fixes the following KASAN error:

BUG: KASAN: global-out-of-bounds in __memcpy_toio+0x124/0x140
Read of size 1 at addr ffffffd35086e386 by task rmtfs/2392

CPU: 2 PID: 2392 Comm: rmtfs Tainted: G W 5.16.0-rc1-lockdep+ #10
Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
Call trace:
dump_backtrace+0x0/0x410
show_stack+0x24/0x30
dump_stack_lvl+0x7c/0xa0
print_address_description+0x78/0x2bc
kasan_report+0x160/0x1a0
__asan_report_load1_noabort+0x44/0x50
__memcpy_toio+0x124/0x140
qcom_pil_info_store+0x298/0x358 [qcom_pil_info]
q6v5_start+0xdf0/0x12e0 [qcom_q6v5_mss]
rproc_start+0x178/0x3a0
rproc_boot+0x5f0/0xb90
state_store+0x78/0x1bc
dev_attr_store+0x70/0x90
sysfs_kf_write+0xf4/0x118
kernfs_fop_write_iter+0x208/0x300
vfs_write+0x55c/0x804
ksys_pwrite64+0xc8/0x134
__arm64_compat_sys_aarch32_pwrite64+0xc4/0xdc
invoke_syscall+0x78/0x20c
el0_svc_common+0x11c/0x1f0
do_el0_svc_compat+0x50/0x60
el0_svc_compat+0x5c/0xec
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x1a4/0x1a8

The buggy address belongs to the variable:
.str.59+0x6/0xffffffffffffec80 [qcom_q6v5_mss]

Memory state around the buggy address:
ffffffd35086e280: 00 00 00 00 02 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
ffffffd35086e300: 00 02 f9 f9 f9 f9 f9 f9 00 00 00 06 f9 f9 f9 f9
>ffffffd35086e380: 06 f9 f9 f9 05 f9 f9 f9 00 00 00 00 00 06 f9 f9
^
ffffffd35086e400: f9 f9 f9 f9 01 f9 f9 f9 04 f9 f9 f9 00 00 01 f9
ffffffd35086e480: f9 f9 f9 f9 00 00 00 00 00 00 00 01 f9 f9 f9 f9

Fixes: 549b67da660d ("remoteproc: qcom: Introduce helper to store pil info in IMEM")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20211117065454.4142936-1-swboyd@chromium.org
diff fdc12231 Tue Nov 16 23:54:54 MST 2021 Stephen Boyd <swboyd@chromium.org> remoteproc: qcom: pil_info: Don't memcpy_toio more than is provided

If the string passed into qcom_pil_info_store() isn't as long as
PIL_RELOC_NAME_LEN we'll try to copy the string assuming the length is
PIL_RELOC_NAME_LEN to the io space and go beyond the bounds of the
string. Let's only copy as many byes as the string is long, ignoring the
NUL terminator.

This fixes the following KASAN error:

BUG: KASAN: global-out-of-bounds in __memcpy_toio+0x124/0x140
Read of size 1 at addr ffffffd35086e386 by task rmtfs/2392

CPU: 2 PID: 2392 Comm: rmtfs Tainted: G W 5.16.0-rc1-lockdep+ #10
Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
Call trace:
dump_backtrace+0x0/0x410
show_stack+0x24/0x30
dump_stack_lvl+0x7c/0xa0
print_address_description+0x78/0x2bc
kasan_report+0x160/0x1a0
__asan_report_load1_noabort+0x44/0x50
__memcpy_toio+0x124/0x140
qcom_pil_info_store+0x298/0x358 [qcom_pil_info]
q6v5_start+0xdf0/0x12e0 [qcom_q6v5_mss]
rproc_start+0x178/0x3a0
rproc_boot+0x5f0/0xb90
state_store+0x78/0x1bc
dev_attr_store+0x70/0x90
sysfs_kf_write+0xf4/0x118
kernfs_fop_write_iter+0x208/0x300
vfs_write+0x55c/0x804
ksys_pwrite64+0xc8/0x134
__arm64_compat_sys_aarch32_pwrite64+0xc4/0xdc
invoke_syscall+0x78/0x20c
el0_svc_common+0x11c/0x1f0
do_el0_svc_compat+0x50/0x60
el0_svc_compat+0x5c/0xec
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x1a4/0x1a8

The buggy address belongs to the variable:
.str.59+0x6/0xffffffffffffec80 [qcom_q6v5_mss]

Memory state around the buggy address:
ffffffd35086e280: 00 00 00 00 02 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
ffffffd35086e300: 00 02 f9 f9 f9 f9 f9 f9 00 00 00 06 f9 f9 f9 f9
>ffffffd35086e380: 06 f9 f9 f9 05 f9 f9 f9 00 00 00 00 00 06 f9 f9
^
ffffffd35086e400: f9 f9 f9 f9 01 f9 f9 f9 04 f9 f9 f9 00 00 01 f9
ffffffd35086e480: f9 f9 f9 f9 00 00 00 00 00 00 00 01 f9 f9 f9 f9

Fixes: 549b67da660d ("remoteproc: qcom: Introduce helper to store pil info in IMEM")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20211117065454.4142936-1-swboyd@chromium.org
diff fdc12231 Tue Nov 16 23:54:54 MST 2021 Stephen Boyd <swboyd@chromium.org> remoteproc: qcom: pil_info: Don't memcpy_toio more than is provided

If the string passed into qcom_pil_info_store() isn't as long as
PIL_RELOC_NAME_LEN we'll try to copy the string assuming the length is
PIL_RELOC_NAME_LEN to the io space and go beyond the bounds of the
string. Let's only copy as many byes as the string is long, ignoring the
NUL terminator.

This fixes the following KASAN error:

BUG: KASAN: global-out-of-bounds in __memcpy_toio+0x124/0x140
Read of size 1 at addr ffffffd35086e386 by task rmtfs/2392

CPU: 2 PID: 2392 Comm: rmtfs Tainted: G W 5.16.0-rc1-lockdep+ #10
Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
Call trace:
dump_backtrace+0x0/0x410
show_stack+0x24/0x30
dump_stack_lvl+0x7c/0xa0
print_address_description+0x78/0x2bc
kasan_report+0x160/0x1a0
__asan_report_load1_noabort+0x44/0x50
__memcpy_toio+0x124/0x140
qcom_pil_info_store+0x298/0x358 [qcom_pil_info]
q6v5_start+0xdf0/0x12e0 [qcom_q6v5_mss]
rproc_start+0x178/0x3a0
rproc_boot+0x5f0/0xb90
state_store+0x78/0x1bc
dev_attr_store+0x70/0x90
sysfs_kf_write+0xf4/0x118
kernfs_fop_write_iter+0x208/0x300
vfs_write+0x55c/0x804
ksys_pwrite64+0xc8/0x134
__arm64_compat_sys_aarch32_pwrite64+0xc4/0xdc
invoke_syscall+0x78/0x20c
el0_svc_common+0x11c/0x1f0
do_el0_svc_compat+0x50/0x60
el0_svc_compat+0x5c/0xec
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x1a4/0x1a8

The buggy address belongs to the variable:
.str.59+0x6/0xffffffffffffec80 [qcom_q6v5_mss]

Memory state around the buggy address:
ffffffd35086e280: 00 00 00 00 02 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
ffffffd35086e300: 00 02 f9 f9 f9 f9 f9 f9 00 00 00 06 f9 f9 f9 f9
>ffffffd35086e380: 06 f9 f9 f9 05 f9 f9 f9 00 00 00 00 00 06 f9 f9
^
ffffffd35086e400: f9 f9 f9 f9 01 f9 f9 f9 04 f9 f9 f9 00 00 01 f9
ffffffd35086e480: f9 f9 f9 f9 00 00 00 00 00 00 00 01 f9 f9 f9 f9

Fixes: 549b67da660d ("remoteproc: qcom: Introduce helper to store pil info in IMEM")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20211117065454.4142936-1-swboyd@chromium.org
diff fdc12231 Tue Nov 16 23:54:54 MST 2021 Stephen Boyd <swboyd@chromium.org> remoteproc: qcom: pil_info: Don't memcpy_toio more than is provided

If the string passed into qcom_pil_info_store() isn't as long as
PIL_RELOC_NAME_LEN we'll try to copy the string assuming the length is
PIL_RELOC_NAME_LEN to the io space and go beyond the bounds of the
string. Let's only copy as many byes as the string is long, ignoring the
NUL terminator.

This fixes the following KASAN error:

BUG: KASAN: global-out-of-bounds in __memcpy_toio+0x124/0x140
Read of size 1 at addr ffffffd35086e386 by task rmtfs/2392

CPU: 2 PID: 2392 Comm: rmtfs Tainted: G W 5.16.0-rc1-lockdep+ #10
Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
Call trace:
dump_backtrace+0x0/0x410
show_stack+0x24/0x30
dump_stack_lvl+0x7c/0xa0
print_address_description+0x78/0x2bc
kasan_report+0x160/0x1a0
__asan_report_load1_noabort+0x44/0x50
__memcpy_toio+0x124/0x140
qcom_pil_info_store+0x298/0x358 [qcom_pil_info]
q6v5_start+0xdf0/0x12e0 [qcom_q6v5_mss]
rproc_start+0x178/0x3a0
rproc_boot+0x5f0/0xb90
state_store+0x78/0x1bc
dev_attr_store+0x70/0x90
sysfs_kf_write+0xf4/0x118
kernfs_fop_write_iter+0x208/0x300
vfs_write+0x55c/0x804
ksys_pwrite64+0xc8/0x134
__arm64_compat_sys_aarch32_pwrite64+0xc4/0xdc
invoke_syscall+0x78/0x20c
el0_svc_common+0x11c/0x1f0
do_el0_svc_compat+0x50/0x60
el0_svc_compat+0x5c/0xec
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x1a4/0x1a8

The buggy address belongs to the variable:
.str.59+0x6/0xffffffffffffec80 [qcom_q6v5_mss]

Memory state around the buggy address:
ffffffd35086e280: 00 00 00 00 02 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
ffffffd35086e300: 00 02 f9 f9 f9 f9 f9 f9 00 00 00 06 f9 f9 f9 f9
>ffffffd35086e380: 06 f9 f9 f9 05 f9 f9 f9 00 00 00 00 00 06 f9 f9
^
ffffffd35086e400: f9 f9 f9 f9 01 f9 f9 f9 04 f9 f9 f9 00 00 01 f9
ffffffd35086e480: f9 f9 f9 f9 00 00 00 00 00 00 00 01 f9 f9 f9 f9

Fixes: 549b67da660d ("remoteproc: qcom: Introduce helper to store pil info in IMEM")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20211117065454.4142936-1-swboyd@chromium.org
diff fdc12231 Tue Nov 16 23:54:54 MST 2021 Stephen Boyd <swboyd@chromium.org> remoteproc: qcom: pil_info: Don't memcpy_toio more than is provided

If the string passed into qcom_pil_info_store() isn't as long as
PIL_RELOC_NAME_LEN we'll try to copy the string assuming the length is
PIL_RELOC_NAME_LEN to the io space and go beyond the bounds of the
string. Let's only copy as many byes as the string is long, ignoring the
NUL terminator.

This fixes the following KASAN error:

BUG: KASAN: global-out-of-bounds in __memcpy_toio+0x124/0x140
Read of size 1 at addr ffffffd35086e386 by task rmtfs/2392

CPU: 2 PID: 2392 Comm: rmtfs Tainted: G W 5.16.0-rc1-lockdep+ #10
Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
Call trace:
dump_backtrace+0x0/0x410
show_stack+0x24/0x30
dump_stack_lvl+0x7c/0xa0
print_address_description+0x78/0x2bc
kasan_report+0x160/0x1a0
__asan_report_load1_noabort+0x44/0x50
__memcpy_toio+0x124/0x140
qcom_pil_info_store+0x298/0x358 [qcom_pil_info]
q6v5_start+0xdf0/0x12e0 [qcom_q6v5_mss]
rproc_start+0x178/0x3a0
rproc_boot+0x5f0/0xb90
state_store+0x78/0x1bc
dev_attr_store+0x70/0x90
sysfs_kf_write+0xf4/0x118
kernfs_fop_write_iter+0x208/0x300
vfs_write+0x55c/0x804
ksys_pwrite64+0xc8/0x134
__arm64_compat_sys_aarch32_pwrite64+0xc4/0xdc
invoke_syscall+0x78/0x20c
el0_svc_common+0x11c/0x1f0
do_el0_svc_compat+0x50/0x60
el0_svc_compat+0x5c/0xec
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x1a4/0x1a8

The buggy address belongs to the variable:
.str.59+0x6/0xffffffffffffec80 [qcom_q6v5_mss]

Memory state around the buggy address:
ffffffd35086e280: 00 00 00 00 02 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
ffffffd35086e300: 00 02 f9 f9 f9 f9 f9 f9 00 00 00 06 f9 f9 f9 f9
>ffffffd35086e380: 06 f9 f9 f9 05 f9 f9 f9 00 00 00 00 00 06 f9 f9
^
ffffffd35086e400: f9 f9 f9 f9 01 f9 f9 f9 04 f9 f9 f9 00 00 01 f9
ffffffd35086e480: f9 f9 f9 f9 00 00 00 00 00 00 00 01 f9 f9 f9 f9

Fixes: 549b67da660d ("remoteproc: qcom: Introduce helper to store pil info in IMEM")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20211117065454.4142936-1-swboyd@chromium.org
diff fdc12231 Tue Nov 16 23:54:54 MST 2021 Stephen Boyd <swboyd@chromium.org> remoteproc: qcom: pil_info: Don't memcpy_toio more than is provided

If the string passed into qcom_pil_info_store() isn't as long as
PIL_RELOC_NAME_LEN we'll try to copy the string assuming the length is
PIL_RELOC_NAME_LEN to the io space and go beyond the bounds of the
string. Let's only copy as many byes as the string is long, ignoring the
NUL terminator.

This fixes the following KASAN error:

BUG: KASAN: global-out-of-bounds in __memcpy_toio+0x124/0x140
Read of size 1 at addr ffffffd35086e386 by task rmtfs/2392

CPU: 2 PID: 2392 Comm: rmtfs Tainted: G W 5.16.0-rc1-lockdep+ #10
Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
Call trace:
dump_backtrace+0x0/0x410
show_stack+0x24/0x30
dump_stack_lvl+0x7c/0xa0
print_address_description+0x78/0x2bc
kasan_report+0x160/0x1a0
__asan_report_load1_noabort+0x44/0x50
__memcpy_toio+0x124/0x140
qcom_pil_info_store+0x298/0x358 [qcom_pil_info]
q6v5_start+0xdf0/0x12e0 [qcom_q6v5_mss]
rproc_start+0x178/0x3a0
rproc_boot+0x5f0/0xb90
state_store+0x78/0x1bc
dev_attr_store+0x70/0x90
sysfs_kf_write+0xf4/0x118
kernfs_fop_write_iter+0x208/0x300
vfs_write+0x55c/0x804
ksys_pwrite64+0xc8/0x134
__arm64_compat_sys_aarch32_pwrite64+0xc4/0xdc
invoke_syscall+0x78/0x20c
el0_svc_common+0x11c/0x1f0
do_el0_svc_compat+0x50/0x60
el0_svc_compat+0x5c/0xec
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x1a4/0x1a8

The buggy address belongs to the variable:
.str.59+0x6/0xffffffffffffec80 [qcom_q6v5_mss]

Memory state around the buggy address:
ffffffd35086e280: 00 00 00 00 02 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
ffffffd35086e300: 00 02 f9 f9 f9 f9 f9 f9 00 00 00 06 f9 f9 f9 f9
>ffffffd35086e380: 06 f9 f9 f9 05 f9 f9 f9 00 00 00 00 00 06 f9 f9
^
ffffffd35086e400: f9 f9 f9 f9 01 f9 f9 f9 04 f9 f9 f9 00 00 01 f9
ffffffd35086e480: f9 f9 f9 f9 00 00 00 00 00 00 00 01 f9 f9 f9 f9

Fixes: 549b67da660d ("remoteproc: qcom: Introduce helper to store pil info in IMEM")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20211117065454.4142936-1-swboyd@chromium.org
diff fdc12231 Tue Nov 16 23:54:54 MST 2021 Stephen Boyd <swboyd@chromium.org> remoteproc: qcom: pil_info: Don't memcpy_toio more than is provided

If the string passed into qcom_pil_info_store() isn't as long as
PIL_RELOC_NAME_LEN we'll try to copy the string assuming the length is
PIL_RELOC_NAME_LEN to the io space and go beyond the bounds of the
string. Let's only copy as many byes as the string is long, ignoring the
NUL terminator.

This fixes the following KASAN error:

BUG: KASAN: global-out-of-bounds in __memcpy_toio+0x124/0x140
Read of size 1 at addr ffffffd35086e386 by task rmtfs/2392

CPU: 2 PID: 2392 Comm: rmtfs Tainted: G W 5.16.0-rc1-lockdep+ #10
Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
Call trace:
dump_backtrace+0x0/0x410
show_stack+0x24/0x30
dump_stack_lvl+0x7c/0xa0
print_address_description+0x78/0x2bc
kasan_report+0x160/0x1a0
__asan_report_load1_noabort+0x44/0x50
__memcpy_toio+0x124/0x140
qcom_pil_info_store+0x298/0x358 [qcom_pil_info]
q6v5_start+0xdf0/0x12e0 [qcom_q6v5_mss]
rproc_start+0x178/0x3a0
rproc_boot+0x5f0/0xb90
state_store+0x78/0x1bc
dev_attr_store+0x70/0x90
sysfs_kf_write+0xf4/0x118
kernfs_fop_write_iter+0x208/0x300
vfs_write+0x55c/0x804
ksys_pwrite64+0xc8/0x134
__arm64_compat_sys_aarch32_pwrite64+0xc4/0xdc
invoke_syscall+0x78/0x20c
el0_svc_common+0x11c/0x1f0
do_el0_svc_compat+0x50/0x60
el0_svc_compat+0x5c/0xec
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x1a4/0x1a8

The buggy address belongs to the variable:
.str.59+0x6/0xffffffffffffec80 [qcom_q6v5_mss]

Memory state around the buggy address:
ffffffd35086e280: 00 00 00 00 02 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
ffffffd35086e300: 00 02 f9 f9 f9 f9 f9 f9 00 00 00 06 f9 f9 f9 f9
>ffffffd35086e380: 06 f9 f9 f9 05 f9 f9 f9 00 00 00 00 00 06 f9 f9
^
ffffffd35086e400: f9 f9 f9 f9 01 f9 f9 f9 04 f9 f9 f9 00 00 01 f9
ffffffd35086e480: f9 f9 f9 f9 00 00 00 00 00 00 00 01 f9 f9 f9 f9

Fixes: 549b67da660d ("remoteproc: qcom: Introduce helper to store pil info in IMEM")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20211117065454.4142936-1-swboyd@chromium.org
diff fdc12231 Tue Nov 16 23:54:54 MST 2021 Stephen Boyd <swboyd@chromium.org> remoteproc: qcom: pil_info: Don't memcpy_toio more than is provided

If the string passed into qcom_pil_info_store() isn't as long as
PIL_RELOC_NAME_LEN we'll try to copy the string assuming the length is
PIL_RELOC_NAME_LEN to the io space and go beyond the bounds of the
string. Let's only copy as many byes as the string is long, ignoring the
NUL terminator.

This fixes the following KASAN error:

BUG: KASAN: global-out-of-bounds in __memcpy_toio+0x124/0x140
Read of size 1 at addr ffffffd35086e386 by task rmtfs/2392

CPU: 2 PID: 2392 Comm: rmtfs Tainted: G W 5.16.0-rc1-lockdep+ #10
Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
Call trace:
dump_backtrace+0x0/0x410
show_stack+0x24/0x30
dump_stack_lvl+0x7c/0xa0
print_address_description+0x78/0x2bc
kasan_report+0x160/0x1a0
__asan_report_load1_noabort+0x44/0x50
__memcpy_toio+0x124/0x140
qcom_pil_info_store+0x298/0x358 [qcom_pil_info]
q6v5_start+0xdf0/0x12e0 [qcom_q6v5_mss]
rproc_start+0x178/0x3a0
rproc_boot+0x5f0/0xb90
state_store+0x78/0x1bc
dev_attr_store+0x70/0x90
sysfs_kf_write+0xf4/0x118
kernfs_fop_write_iter+0x208/0x300
vfs_write+0x55c/0x804
ksys_pwrite64+0xc8/0x134
__arm64_compat_sys_aarch32_pwrite64+0xc4/0xdc
invoke_syscall+0x78/0x20c
el0_svc_common+0x11c/0x1f0
do_el0_svc_compat+0x50/0x60
el0_svc_compat+0x5c/0xec
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x1a4/0x1a8

The buggy address belongs to the variable:
.str.59+0x6/0xffffffffffffec80 [qcom_q6v5_mss]

Memory state around the buggy address:
ffffffd35086e280: 00 00 00 00 02 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
ffffffd35086e300: 00 02 f9 f9 f9 f9 f9 f9 00 00 00 06 f9 f9 f9 f9
>ffffffd35086e380: 06 f9 f9 f9 05 f9 f9 f9 00 00 00 00 00 06 f9 f9
^
ffffffd35086e400: f9 f9 f9 f9 01 f9 f9 f9 04 f9 f9 f9 00 00 01 f9
ffffffd35086e480: f9 f9 f9 f9 00 00 00 00 00 00 00 01 f9 f9 f9 f9

Fixes: 549b67da660d ("remoteproc: qcom: Introduce helper to store pil info in IMEM")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20211117065454.4142936-1-swboyd@chromium.org
diff fdc12231 Tue Nov 16 23:54:54 MST 2021 Stephen Boyd <swboyd@chromium.org> remoteproc: qcom: pil_info: Don't memcpy_toio more than is provided

If the string passed into qcom_pil_info_store() isn't as long as
PIL_RELOC_NAME_LEN we'll try to copy the string assuming the length is
PIL_RELOC_NAME_LEN to the io space and go beyond the bounds of the
string. Let's only copy as many byes as the string is long, ignoring the
NUL terminator.

This fixes the following KASAN error:

BUG: KASAN: global-out-of-bounds in __memcpy_toio+0x124/0x140
Read of size 1 at addr ffffffd35086e386 by task rmtfs/2392

CPU: 2 PID: 2392 Comm: rmtfs Tainted: G W 5.16.0-rc1-lockdep+ #10
Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
Call trace:
dump_backtrace+0x0/0x410
show_stack+0x24/0x30
dump_stack_lvl+0x7c/0xa0
print_address_description+0x78/0x2bc
kasan_report+0x160/0x1a0
__asan_report_load1_noabort+0x44/0x50
__memcpy_toio+0x124/0x140
qcom_pil_info_store+0x298/0x358 [qcom_pil_info]
q6v5_start+0xdf0/0x12e0 [qcom_q6v5_mss]
rproc_start+0x178/0x3a0
rproc_boot+0x5f0/0xb90
state_store+0x78/0x1bc
dev_attr_store+0x70/0x90
sysfs_kf_write+0xf4/0x118
kernfs_fop_write_iter+0x208/0x300
vfs_write+0x55c/0x804
ksys_pwrite64+0xc8/0x134
__arm64_compat_sys_aarch32_pwrite64+0xc4/0xdc
invoke_syscall+0x78/0x20c
el0_svc_common+0x11c/0x1f0
do_el0_svc_compat+0x50/0x60
el0_svc_compat+0x5c/0xec
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x1a4/0x1a8

The buggy address belongs to the variable:
.str.59+0x6/0xffffffffffffec80 [qcom_q6v5_mss]

Memory state around the buggy address:
ffffffd35086e280: 00 00 00 00 02 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
ffffffd35086e300: 00 02 f9 f9 f9 f9 f9 f9 00 00 00 06 f9 f9 f9 f9
>ffffffd35086e380: 06 f9 f9 f9 05 f9 f9 f9 00 00 00 00 00 06 f9 f9
^
ffffffd35086e400: f9 f9 f9 f9 01 f9 f9 f9 04 f9 f9 f9 00 00 01 f9
ffffffd35086e480: f9 f9 f9 f9 00 00 00 00 00 00 00 01 f9 f9 f9 f9

Fixes: 549b67da660d ("remoteproc: qcom: Introduce helper to store pil info in IMEM")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20211117065454.4142936-1-swboyd@chromium.org
diff fdc12231 Tue Nov 16 23:54:54 MST 2021 Stephen Boyd <swboyd@chromium.org> remoteproc: qcom: pil_info: Don't memcpy_toio more than is provided

If the string passed into qcom_pil_info_store() isn't as long as
PIL_RELOC_NAME_LEN we'll try to copy the string assuming the length is
PIL_RELOC_NAME_LEN to the io space and go beyond the bounds of the
string. Let's only copy as many byes as the string is long, ignoring the
NUL terminator.

This fixes the following KASAN error:

BUG: KASAN: global-out-of-bounds in __memcpy_toio+0x124/0x140
Read of size 1 at addr ffffffd35086e386 by task rmtfs/2392

CPU: 2 PID: 2392 Comm: rmtfs Tainted: G W 5.16.0-rc1-lockdep+ #10
Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
Call trace:
dump_backtrace+0x0/0x410
show_stack+0x24/0x30
dump_stack_lvl+0x7c/0xa0
print_address_description+0x78/0x2bc
kasan_report+0x160/0x1a0
__asan_report_load1_noabort+0x44/0x50
__memcpy_toio+0x124/0x140
qcom_pil_info_store+0x298/0x358 [qcom_pil_info]
q6v5_start+0xdf0/0x12e0 [qcom_q6v5_mss]
rproc_start+0x178/0x3a0
rproc_boot+0x5f0/0xb90
state_store+0x78/0x1bc
dev_attr_store+0x70/0x90
sysfs_kf_write+0xf4/0x118
kernfs_fop_write_iter+0x208/0x300
vfs_write+0x55c/0x804
ksys_pwrite64+0xc8/0x134
__arm64_compat_sys_aarch32_pwrite64+0xc4/0xdc
invoke_syscall+0x78/0x20c
el0_svc_common+0x11c/0x1f0
do_el0_svc_compat+0x50/0x60
el0_svc_compat+0x5c/0xec
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x1a4/0x1a8

The buggy address belongs to the variable:
.str.59+0x6/0xffffffffffffec80 [qcom_q6v5_mss]

Memory state around the buggy address:
ffffffd35086e280: 00 00 00 00 02 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
ffffffd35086e300: 00 02 f9 f9 f9 f9 f9 f9 00 00 00 06 f9 f9 f9 f9
>ffffffd35086e380: 06 f9 f9 f9 05 f9 f9 f9 00 00 00 00 00 06 f9 f9
^
ffffffd35086e400: f9 f9 f9 f9 01 f9 f9 f9 04 f9 f9 f9 00 00 01 f9
ffffffd35086e480: f9 f9 f9 f9 00 00 00 00 00 00 00 01 f9 f9 f9 f9

Fixes: 549b67da660d ("remoteproc: qcom: Introduce helper to store pil info in IMEM")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20211117065454.4142936-1-swboyd@chromium.org
diff fdc12231 Tue Nov 16 23:54:54 MST 2021 Stephen Boyd <swboyd@chromium.org> remoteproc: qcom: pil_info: Don't memcpy_toio more than is provided

If the string passed into qcom_pil_info_store() isn't as long as
PIL_RELOC_NAME_LEN we'll try to copy the string assuming the length is
PIL_RELOC_NAME_LEN to the io space and go beyond the bounds of the
string. Let's only copy as many byes as the string is long, ignoring the
NUL terminator.

This fixes the following KASAN error:

BUG: KASAN: global-out-of-bounds in __memcpy_toio+0x124/0x140
Read of size 1 at addr ffffffd35086e386 by task rmtfs/2392

CPU: 2 PID: 2392 Comm: rmtfs Tainted: G W 5.16.0-rc1-lockdep+ #10
Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
Call trace:
dump_backtrace+0x0/0x410
show_stack+0x24/0x30
dump_stack_lvl+0x7c/0xa0
print_address_description+0x78/0x2bc
kasan_report+0x160/0x1a0
__asan_report_load1_noabort+0x44/0x50
__memcpy_toio+0x124/0x140
qcom_pil_info_store+0x298/0x358 [qcom_pil_info]
q6v5_start+0xdf0/0x12e0 [qcom_q6v5_mss]
rproc_start+0x178/0x3a0
rproc_boot+0x5f0/0xb90
state_store+0x78/0x1bc
dev_attr_store+0x70/0x90
sysfs_kf_write+0xf4/0x118
kernfs_fop_write_iter+0x208/0x300
vfs_write+0x55c/0x804
ksys_pwrite64+0xc8/0x134
__arm64_compat_sys_aarch32_pwrite64+0xc4/0xdc
invoke_syscall+0x78/0x20c
el0_svc_common+0x11c/0x1f0
do_el0_svc_compat+0x50/0x60
el0_svc_compat+0x5c/0xec
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x1a4/0x1a8

The buggy address belongs to the variable:
.str.59+0x6/0xffffffffffffec80 [qcom_q6v5_mss]

Memory state around the buggy address:
ffffffd35086e280: 00 00 00 00 02 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
ffffffd35086e300: 00 02 f9 f9 f9 f9 f9 f9 00 00 00 06 f9 f9 f9 f9
>ffffffd35086e380: 06 f9 f9 f9 05 f9 f9 f9 00 00 00 00 00 06 f9 f9
^
ffffffd35086e400: f9 f9 f9 f9 01 f9 f9 f9 04 f9 f9 f9 00 00 01 f9
ffffffd35086e480: f9 f9 f9 f9 00 00 00 00 00 00 00 01 f9 f9 f9 f9

Fixes: 549b67da660d ("remoteproc: qcom: Introduce helper to store pil info in IMEM")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20211117065454.4142936-1-swboyd@chromium.org
diff fdc12231 Tue Nov 16 23:54:54 MST 2021 Stephen Boyd <swboyd@chromium.org> remoteproc: qcom: pil_info: Don't memcpy_toio more than is provided

If the string passed into qcom_pil_info_store() isn't as long as
PIL_RELOC_NAME_LEN we'll try to copy the string assuming the length is
PIL_RELOC_NAME_LEN to the io space and go beyond the bounds of the
string. Let's only copy as many byes as the string is long, ignoring the
NUL terminator.

This fixes the following KASAN error:

BUG: KASAN: global-out-of-bounds in __memcpy_toio+0x124/0x140
Read of size 1 at addr ffffffd35086e386 by task rmtfs/2392

CPU: 2 PID: 2392 Comm: rmtfs Tainted: G W 5.16.0-rc1-lockdep+ #10
Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
Call trace:
dump_backtrace+0x0/0x410
show_stack+0x24/0x30
dump_stack_lvl+0x7c/0xa0
print_address_description+0x78/0x2bc
kasan_report+0x160/0x1a0
__asan_report_load1_noabort+0x44/0x50
__memcpy_toio+0x124/0x140
qcom_pil_info_store+0x298/0x358 [qcom_pil_info]
q6v5_start+0xdf0/0x12e0 [qcom_q6v5_mss]
rproc_start+0x178/0x3a0
rproc_boot+0x5f0/0xb90
state_store+0x78/0x1bc
dev_attr_store+0x70/0x90
sysfs_kf_write+0xf4/0x118
kernfs_fop_write_iter+0x208/0x300
vfs_write+0x55c/0x804
ksys_pwrite64+0xc8/0x134
__arm64_compat_sys_aarch32_pwrite64+0xc4/0xdc
invoke_syscall+0x78/0x20c
el0_svc_common+0x11c/0x1f0
do_el0_svc_compat+0x50/0x60
el0_svc_compat+0x5c/0xec
el0t_32_sync_handler+0xc0/0xf0
el0t_32_sync+0x1a4/0x1a8

The buggy address belongs to the variable:
.str.59+0x6/0xffffffffffffec80 [qcom_q6v5_mss]

Memory state around the buggy address:
ffffffd35086e280: 00 00 00 00 02 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
ffffffd35086e300: 00 02 f9 f9 f9 f9 f9 f9 00 00 00 06 f9 f9 f9 f9
>ffffffd35086e380: 06 f9 f9 f9 05 f9 f9 f9 00 00 00 00 00 06 f9 f9
^
ffffffd35086e400: f9 f9 f9 f9 01 f9 f9 f9 04 f9 f9 f9 00 00 01 f9
ffffffd35086e480: f9 f9 f9 f9 00 00 00 00 00 00 00 01 f9 f9 f9 f9

Fixes: 549b67da660d ("remoteproc: qcom: Introduce helper to store pil info in IMEM")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Link: https://lore.kernel.org/r/20211117065454.4142936-1-swboyd@chromium.org
/linux-master/tools/testing/selftests/tc-testing/tc-tests/actions/
H A Dconnmark.jsondiff 0fc86746 Thu Sep 08 19:29:32 MDT 2022 Zhengchao Shao <shaozhengchao@huawei.com> selftests/tc-testings: add connmark action deleting test case

Test 6571: Delete connmark action with valid index
Test 3426: Delete connmark action with invalid index

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff c53075ea Wed Mar 20 08:00:05 MDT 2019 Davide Caratti <dcaratti@redhat.com> net/sched: act_connmark: validate the control action inside init()

the following script:

# tc qdisc add dev crash0 clsact
# tc filter add dev crash0 egress matchall \
> action connmark pass index 90
# tc actions replace action connmark \
> goto chain 42 index 90 cookie c1a0c1a0
# tc actions show action connmark

had the following output:

Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1

action order 0: connmark zone 0 goto chain 42
index 90 ref 2 bind 1
cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 302 Comm: kworker/0:2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:tcf_action_exec+0xb8/0x100
Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
RSP: 0018:ffff9bea406c3ad0 EFLAGS: 00010246
RAX: 000000002000002a RBX: ffff8c5dfc009f00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9bea406c3a80 RDI: ffff8c5dfb9d6ec0
RBP: ffff9bea406c3b70 R08: ffff8c5dfda222a0 R09: ffffffff90933c3c
R10: 0000000000000000 R11: 0000000092793f7d R12: ffff8c5df48b3c00
R13: ffff8c5df48b3c08 R14: 0000000000000001 R15: ffff8c5dfb9d6e40
FS: 0000000000000000(0000) GS:ffff8c5dfda00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000062e0e006 CR4: 00000000001606f0
Call Trace:
tcf_classify+0x58/0x120
__dev_queue_xmit+0x40a/0x890
? ndisc_next_option+0x50/0x50
? ___neigh_create+0x4d5/0x680
? ip6_finish_output2+0x1b5/0x590
ip6_finish_output2+0x1b5/0x590
? ip6_output+0x68/0x110
ip6_output+0x68/0x110
? nf_hook.constprop.28+0x79/0xc0
ndisc_send_skb+0x248/0x2e0
ndisc_send_ns+0xf8/0x200
? addrconf_dad_work+0x389/0x4b0
addrconf_dad_work+0x389/0x4b0
? __switch_to_asm+0x34/0x70
? process_one_work+0x195/0x380
? addrconf_dad_completed+0x370/0x370
process_one_work+0x195/0x380
worker_thread+0x30/0x390
? process_one_work+0x380/0x380
kthread+0x113/0x130
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40
Modules linked in: act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd cryptd snd glue_helper joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_net net_failover syscopyarea virtio_blk failover virtio_console sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CR2: 0000000000000000

Validating the control action within tcf_connmark_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff c53075ea Wed Mar 20 08:00:05 MDT 2019 Davide Caratti <dcaratti@redhat.com> net/sched: act_connmark: validate the control action inside init()

the following script:

# tc qdisc add dev crash0 clsact
# tc filter add dev crash0 egress matchall \
> action connmark pass index 90
# tc actions replace action connmark \
> goto chain 42 index 90 cookie c1a0c1a0
# tc actions show action connmark

had the following output:

Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1

action order 0: connmark zone 0 goto chain 42
index 90 ref 2 bind 1
cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 302 Comm: kworker/0:2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:tcf_action_exec+0xb8/0x100
Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
RSP: 0018:ffff9bea406c3ad0 EFLAGS: 00010246
RAX: 000000002000002a RBX: ffff8c5dfc009f00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9bea406c3a80 RDI: ffff8c5dfb9d6ec0
RBP: ffff9bea406c3b70 R08: ffff8c5dfda222a0 R09: ffffffff90933c3c
R10: 0000000000000000 R11: 0000000092793f7d R12: ffff8c5df48b3c00
R13: ffff8c5df48b3c08 R14: 0000000000000001 R15: ffff8c5dfb9d6e40
FS: 0000000000000000(0000) GS:ffff8c5dfda00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000062e0e006 CR4: 00000000001606f0
Call Trace:
tcf_classify+0x58/0x120
__dev_queue_xmit+0x40a/0x890
? ndisc_next_option+0x50/0x50
? ___neigh_create+0x4d5/0x680
? ip6_finish_output2+0x1b5/0x590
ip6_finish_output2+0x1b5/0x590
? ip6_output+0x68/0x110
ip6_output+0x68/0x110
? nf_hook.constprop.28+0x79/0xc0
ndisc_send_skb+0x248/0x2e0
ndisc_send_ns+0xf8/0x200
? addrconf_dad_work+0x389/0x4b0
addrconf_dad_work+0x389/0x4b0
? __switch_to_asm+0x34/0x70
? process_one_work+0x195/0x380
? addrconf_dad_completed+0x370/0x370
process_one_work+0x195/0x380
worker_thread+0x30/0x390
? process_one_work+0x380/0x380
kthread+0x113/0x130
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40
Modules linked in: act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd cryptd snd glue_helper joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_net net_failover syscopyarea virtio_blk failover virtio_console sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CR2: 0000000000000000

Validating the control action within tcf_connmark_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff c53075ea Wed Mar 20 08:00:05 MDT 2019 Davide Caratti <dcaratti@redhat.com> net/sched: act_connmark: validate the control action inside init()

the following script:

# tc qdisc add dev crash0 clsact
# tc filter add dev crash0 egress matchall \
> action connmark pass index 90
# tc actions replace action connmark \
> goto chain 42 index 90 cookie c1a0c1a0
# tc actions show action connmark

had the following output:

Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1

action order 0: connmark zone 0 goto chain 42
index 90 ref 2 bind 1
cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 302 Comm: kworker/0:2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:tcf_action_exec+0xb8/0x100
Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
RSP: 0018:ffff9bea406c3ad0 EFLAGS: 00010246
RAX: 000000002000002a RBX: ffff8c5dfc009f00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9bea406c3a80 RDI: ffff8c5dfb9d6ec0
RBP: ffff9bea406c3b70 R08: ffff8c5dfda222a0 R09: ffffffff90933c3c
R10: 0000000000000000 R11: 0000000092793f7d R12: ffff8c5df48b3c00
R13: ffff8c5df48b3c08 R14: 0000000000000001 R15: ffff8c5dfb9d6e40
FS: 0000000000000000(0000) GS:ffff8c5dfda00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000062e0e006 CR4: 00000000001606f0
Call Trace:
tcf_classify+0x58/0x120
__dev_queue_xmit+0x40a/0x890
? ndisc_next_option+0x50/0x50
? ___neigh_create+0x4d5/0x680
? ip6_finish_output2+0x1b5/0x590
ip6_finish_output2+0x1b5/0x590
? ip6_output+0x68/0x110
ip6_output+0x68/0x110
? nf_hook.constprop.28+0x79/0xc0
ndisc_send_skb+0x248/0x2e0
ndisc_send_ns+0xf8/0x200
? addrconf_dad_work+0x389/0x4b0
addrconf_dad_work+0x389/0x4b0
? __switch_to_asm+0x34/0x70
? process_one_work+0x195/0x380
? addrconf_dad_completed+0x370/0x370
process_one_work+0x195/0x380
worker_thread+0x30/0x390
? process_one_work+0x380/0x380
kthread+0x113/0x130
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40
Modules linked in: act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd cryptd snd glue_helper joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_net net_failover syscopyarea virtio_blk failover virtio_console sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CR2: 0000000000000000

Validating the control action within tcf_connmark_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff c53075ea Wed Mar 20 08:00:05 MDT 2019 Davide Caratti <dcaratti@redhat.com> net/sched: act_connmark: validate the control action inside init()

the following script:

# tc qdisc add dev crash0 clsact
# tc filter add dev crash0 egress matchall \
> action connmark pass index 90
# tc actions replace action connmark \
> goto chain 42 index 90 cookie c1a0c1a0
# tc actions show action connmark

had the following output:

Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1

action order 0: connmark zone 0 goto chain 42
index 90 ref 2 bind 1
cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 302 Comm: kworker/0:2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:tcf_action_exec+0xb8/0x100
Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
RSP: 0018:ffff9bea406c3ad0 EFLAGS: 00010246
RAX: 000000002000002a RBX: ffff8c5dfc009f00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9bea406c3a80 RDI: ffff8c5dfb9d6ec0
RBP: ffff9bea406c3b70 R08: ffff8c5dfda222a0 R09: ffffffff90933c3c
R10: 0000000000000000 R11: 0000000092793f7d R12: ffff8c5df48b3c00
R13: ffff8c5df48b3c08 R14: 0000000000000001 R15: ffff8c5dfb9d6e40
FS: 0000000000000000(0000) GS:ffff8c5dfda00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000062e0e006 CR4: 00000000001606f0
Call Trace:
tcf_classify+0x58/0x120
__dev_queue_xmit+0x40a/0x890
? ndisc_next_option+0x50/0x50
? ___neigh_create+0x4d5/0x680
? ip6_finish_output2+0x1b5/0x590
ip6_finish_output2+0x1b5/0x590
? ip6_output+0x68/0x110
ip6_output+0x68/0x110
? nf_hook.constprop.28+0x79/0xc0
ndisc_send_skb+0x248/0x2e0
ndisc_send_ns+0xf8/0x200
? addrconf_dad_work+0x389/0x4b0
addrconf_dad_work+0x389/0x4b0
? __switch_to_asm+0x34/0x70
? process_one_work+0x195/0x380
? addrconf_dad_completed+0x370/0x370
process_one_work+0x195/0x380
worker_thread+0x30/0x390
? process_one_work+0x380/0x380
kthread+0x113/0x130
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40
Modules linked in: act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd cryptd snd glue_helper joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_net net_failover syscopyarea virtio_blk failover virtio_console sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CR2: 0000000000000000

Validating the control action within tcf_connmark_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff c53075ea Wed Mar 20 08:00:05 MDT 2019 Davide Caratti <dcaratti@redhat.com> net/sched: act_connmark: validate the control action inside init()

the following script:

# tc qdisc add dev crash0 clsact
# tc filter add dev crash0 egress matchall \
> action connmark pass index 90
# tc actions replace action connmark \
> goto chain 42 index 90 cookie c1a0c1a0
# tc actions show action connmark

had the following output:

Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1

action order 0: connmark zone 0 goto chain 42
index 90 ref 2 bind 1
cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 302 Comm: kworker/0:2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:tcf_action_exec+0xb8/0x100
Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
RSP: 0018:ffff9bea406c3ad0 EFLAGS: 00010246
RAX: 000000002000002a RBX: ffff8c5dfc009f00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9bea406c3a80 RDI: ffff8c5dfb9d6ec0
RBP: ffff9bea406c3b70 R08: ffff8c5dfda222a0 R09: ffffffff90933c3c
R10: 0000000000000000 R11: 0000000092793f7d R12: ffff8c5df48b3c00
R13: ffff8c5df48b3c08 R14: 0000000000000001 R15: ffff8c5dfb9d6e40
FS: 0000000000000000(0000) GS:ffff8c5dfda00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000062e0e006 CR4: 00000000001606f0
Call Trace:
tcf_classify+0x58/0x120
__dev_queue_xmit+0x40a/0x890
? ndisc_next_option+0x50/0x50
? ___neigh_create+0x4d5/0x680
? ip6_finish_output2+0x1b5/0x590
ip6_finish_output2+0x1b5/0x590
? ip6_output+0x68/0x110
ip6_output+0x68/0x110
? nf_hook.constprop.28+0x79/0xc0
ndisc_send_skb+0x248/0x2e0
ndisc_send_ns+0xf8/0x200
? addrconf_dad_work+0x389/0x4b0
addrconf_dad_work+0x389/0x4b0
? __switch_to_asm+0x34/0x70
? process_one_work+0x195/0x380
? addrconf_dad_completed+0x370/0x370
process_one_work+0x195/0x380
worker_thread+0x30/0x390
? process_one_work+0x380/0x380
kthread+0x113/0x130
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40
Modules linked in: act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd cryptd snd glue_helper joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_net net_failover syscopyarea virtio_blk failover virtio_console sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CR2: 0000000000000000

Validating the control action within tcf_connmark_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff c53075ea Wed Mar 20 08:00:05 MDT 2019 Davide Caratti <dcaratti@redhat.com> net/sched: act_connmark: validate the control action inside init()

the following script:

# tc qdisc add dev crash0 clsact
# tc filter add dev crash0 egress matchall \
> action connmark pass index 90
# tc actions replace action connmark \
> goto chain 42 index 90 cookie c1a0c1a0
# tc actions show action connmark

had the following output:

Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1

action order 0: connmark zone 0 goto chain 42
index 90 ref 2 bind 1
cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 302 Comm: kworker/0:2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:tcf_action_exec+0xb8/0x100
Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
RSP: 0018:ffff9bea406c3ad0 EFLAGS: 00010246
RAX: 000000002000002a RBX: ffff8c5dfc009f00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9bea406c3a80 RDI: ffff8c5dfb9d6ec0
RBP: ffff9bea406c3b70 R08: ffff8c5dfda222a0 R09: ffffffff90933c3c
R10: 0000000000000000 R11: 0000000092793f7d R12: ffff8c5df48b3c00
R13: ffff8c5df48b3c08 R14: 0000000000000001 R15: ffff8c5dfb9d6e40
FS: 0000000000000000(0000) GS:ffff8c5dfda00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000062e0e006 CR4: 00000000001606f0
Call Trace:
tcf_classify+0x58/0x120
__dev_queue_xmit+0x40a/0x890
? ndisc_next_option+0x50/0x50
? ___neigh_create+0x4d5/0x680
? ip6_finish_output2+0x1b5/0x590
ip6_finish_output2+0x1b5/0x590
? ip6_output+0x68/0x110
ip6_output+0x68/0x110
? nf_hook.constprop.28+0x79/0xc0
ndisc_send_skb+0x248/0x2e0
ndisc_send_ns+0xf8/0x200
? addrconf_dad_work+0x389/0x4b0
addrconf_dad_work+0x389/0x4b0
? __switch_to_asm+0x34/0x70
? process_one_work+0x195/0x380
? addrconf_dad_completed+0x370/0x370
process_one_work+0x195/0x380
worker_thread+0x30/0x390
? process_one_work+0x380/0x380
kthread+0x113/0x130
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40
Modules linked in: act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd cryptd snd glue_helper joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_net net_failover syscopyarea virtio_blk failover virtio_console sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CR2: 0000000000000000

Validating the control action within tcf_connmark_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff c53075ea Wed Mar 20 08:00:05 MDT 2019 Davide Caratti <dcaratti@redhat.com> net/sched: act_connmark: validate the control action inside init()

the following script:

# tc qdisc add dev crash0 clsact
# tc filter add dev crash0 egress matchall \
> action connmark pass index 90
# tc actions replace action connmark \
> goto chain 42 index 90 cookie c1a0c1a0
# tc actions show action connmark

had the following output:

Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1

action order 0: connmark zone 0 goto chain 42
index 90 ref 2 bind 1
cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 302 Comm: kworker/0:2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:tcf_action_exec+0xb8/0x100
Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
RSP: 0018:ffff9bea406c3ad0 EFLAGS: 00010246
RAX: 000000002000002a RBX: ffff8c5dfc009f00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9bea406c3a80 RDI: ffff8c5dfb9d6ec0
RBP: ffff9bea406c3b70 R08: ffff8c5dfda222a0 R09: ffffffff90933c3c
R10: 0000000000000000 R11: 0000000092793f7d R12: ffff8c5df48b3c00
R13: ffff8c5df48b3c08 R14: 0000000000000001 R15: ffff8c5dfb9d6e40
FS: 0000000000000000(0000) GS:ffff8c5dfda00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000062e0e006 CR4: 00000000001606f0
Call Trace:
tcf_classify+0x58/0x120
__dev_queue_xmit+0x40a/0x890
? ndisc_next_option+0x50/0x50
? ___neigh_create+0x4d5/0x680
? ip6_finish_output2+0x1b5/0x590
ip6_finish_output2+0x1b5/0x590
? ip6_output+0x68/0x110
ip6_output+0x68/0x110
? nf_hook.constprop.28+0x79/0xc0
ndisc_send_skb+0x248/0x2e0
ndisc_send_ns+0xf8/0x200
? addrconf_dad_work+0x389/0x4b0
addrconf_dad_work+0x389/0x4b0
? __switch_to_asm+0x34/0x70
? process_one_work+0x195/0x380
? addrconf_dad_completed+0x370/0x370
process_one_work+0x195/0x380
worker_thread+0x30/0x390
? process_one_work+0x380/0x380
kthread+0x113/0x130
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40
Modules linked in: act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd cryptd snd glue_helper joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_net net_failover syscopyarea virtio_blk failover virtio_console sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CR2: 0000000000000000

Validating the control action within tcf_connmark_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff c53075ea Wed Mar 20 08:00:05 MDT 2019 Davide Caratti <dcaratti@redhat.com> net/sched: act_connmark: validate the control action inside init()

the following script:

# tc qdisc add dev crash0 clsact
# tc filter add dev crash0 egress matchall \
> action connmark pass index 90
# tc actions replace action connmark \
> goto chain 42 index 90 cookie c1a0c1a0
# tc actions show action connmark

had the following output:

Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1

action order 0: connmark zone 0 goto chain 42
index 90 ref 2 bind 1
cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 302 Comm: kworker/0:2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:tcf_action_exec+0xb8/0x100
Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
RSP: 0018:ffff9bea406c3ad0 EFLAGS: 00010246
RAX: 000000002000002a RBX: ffff8c5dfc009f00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9bea406c3a80 RDI: ffff8c5dfb9d6ec0
RBP: ffff9bea406c3b70 R08: ffff8c5dfda222a0 R09: ffffffff90933c3c
R10: 0000000000000000 R11: 0000000092793f7d R12: ffff8c5df48b3c00
R13: ffff8c5df48b3c08 R14: 0000000000000001 R15: ffff8c5dfb9d6e40
FS: 0000000000000000(0000) GS:ffff8c5dfda00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000062e0e006 CR4: 00000000001606f0
Call Trace:
tcf_classify+0x58/0x120
__dev_queue_xmit+0x40a/0x890
? ndisc_next_option+0x50/0x50
? ___neigh_create+0x4d5/0x680
? ip6_finish_output2+0x1b5/0x590
ip6_finish_output2+0x1b5/0x590
? ip6_output+0x68/0x110
ip6_output+0x68/0x110
? nf_hook.constprop.28+0x79/0xc0
ndisc_send_skb+0x248/0x2e0
ndisc_send_ns+0xf8/0x200
? addrconf_dad_work+0x389/0x4b0
addrconf_dad_work+0x389/0x4b0
? __switch_to_asm+0x34/0x70
? process_one_work+0x195/0x380
? addrconf_dad_completed+0x370/0x370
process_one_work+0x195/0x380
worker_thread+0x30/0x390
? process_one_work+0x380/0x380
kthread+0x113/0x130
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40
Modules linked in: act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd cryptd snd glue_helper joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_net net_failover syscopyarea virtio_blk failover virtio_console sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CR2: 0000000000000000

Validating the control action within tcf_connmark_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff c53075ea Wed Mar 20 08:00:05 MDT 2019 Davide Caratti <dcaratti@redhat.com> net/sched: act_connmark: validate the control action inside init()

the following script:

# tc qdisc add dev crash0 clsact
# tc filter add dev crash0 egress matchall \
> action connmark pass index 90
# tc actions replace action connmark \
> goto chain 42 index 90 cookie c1a0c1a0
# tc actions show action connmark

had the following output:

Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1

action order 0: connmark zone 0 goto chain 42
index 90 ref 2 bind 1
cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 302 Comm: kworker/0:2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:tcf_action_exec+0xb8/0x100
Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
RSP: 0018:ffff9bea406c3ad0 EFLAGS: 00010246
RAX: 000000002000002a RBX: ffff8c5dfc009f00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9bea406c3a80 RDI: ffff8c5dfb9d6ec0
RBP: ffff9bea406c3b70 R08: ffff8c5dfda222a0 R09: ffffffff90933c3c
R10: 0000000000000000 R11: 0000000092793f7d R12: ffff8c5df48b3c00
R13: ffff8c5df48b3c08 R14: 0000000000000001 R15: ffff8c5dfb9d6e40
FS: 0000000000000000(0000) GS:ffff8c5dfda00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000062e0e006 CR4: 00000000001606f0
Call Trace:
tcf_classify+0x58/0x120
__dev_queue_xmit+0x40a/0x890
? ndisc_next_option+0x50/0x50
? ___neigh_create+0x4d5/0x680
? ip6_finish_output2+0x1b5/0x590
ip6_finish_output2+0x1b5/0x590
? ip6_output+0x68/0x110
ip6_output+0x68/0x110
? nf_hook.constprop.28+0x79/0xc0
ndisc_send_skb+0x248/0x2e0
ndisc_send_ns+0xf8/0x200
? addrconf_dad_work+0x389/0x4b0
addrconf_dad_work+0x389/0x4b0
? __switch_to_asm+0x34/0x70
? process_one_work+0x195/0x380
? addrconf_dad_completed+0x370/0x370
process_one_work+0x195/0x380
worker_thread+0x30/0x390
? process_one_work+0x380/0x380
kthread+0x113/0x130
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40
Modules linked in: act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd cryptd snd glue_helper joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_net net_failover syscopyarea virtio_blk failover virtio_console sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CR2: 0000000000000000

Validating the control action within tcf_connmark_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff c53075ea Wed Mar 20 08:00:05 MDT 2019 Davide Caratti <dcaratti@redhat.com> net/sched: act_connmark: validate the control action inside init()

the following script:

# tc qdisc add dev crash0 clsact
# tc filter add dev crash0 egress matchall \
> action connmark pass index 90
# tc actions replace action connmark \
> goto chain 42 index 90 cookie c1a0c1a0
# tc actions show action connmark

had the following output:

Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1

action order 0: connmark zone 0 goto chain 42
index 90 ref 2 bind 1
cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 302 Comm: kworker/0:2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:tcf_action_exec+0xb8/0x100
Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
RSP: 0018:ffff9bea406c3ad0 EFLAGS: 00010246
RAX: 000000002000002a RBX: ffff8c5dfc009f00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9bea406c3a80 RDI: ffff8c5dfb9d6ec0
RBP: ffff9bea406c3b70 R08: ffff8c5dfda222a0 R09: ffffffff90933c3c
R10: 0000000000000000 R11: 0000000092793f7d R12: ffff8c5df48b3c00
R13: ffff8c5df48b3c08 R14: 0000000000000001 R15: ffff8c5dfb9d6e40
FS: 0000000000000000(0000) GS:ffff8c5dfda00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000062e0e006 CR4: 00000000001606f0
Call Trace:
tcf_classify+0x58/0x120
__dev_queue_xmit+0x40a/0x890
? ndisc_next_option+0x50/0x50
? ___neigh_create+0x4d5/0x680
? ip6_finish_output2+0x1b5/0x590
ip6_finish_output2+0x1b5/0x590
? ip6_output+0x68/0x110
ip6_output+0x68/0x110
? nf_hook.constprop.28+0x79/0xc0
ndisc_send_skb+0x248/0x2e0
ndisc_send_ns+0xf8/0x200
? addrconf_dad_work+0x389/0x4b0
addrconf_dad_work+0x389/0x4b0
? __switch_to_asm+0x34/0x70
? process_one_work+0x195/0x380
? addrconf_dad_completed+0x370/0x370
process_one_work+0x195/0x380
worker_thread+0x30/0x390
? process_one_work+0x380/0x380
kthread+0x113/0x130
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40
Modules linked in: act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd cryptd snd glue_helper joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_net net_failover syscopyarea virtio_blk failover virtio_console sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CR2: 0000000000000000

Validating the control action within tcf_connmark_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff c53075ea Wed Mar 20 08:00:05 MDT 2019 Davide Caratti <dcaratti@redhat.com> net/sched: act_connmark: validate the control action inside init()

the following script:

# tc qdisc add dev crash0 clsact
# tc filter add dev crash0 egress matchall \
> action connmark pass index 90
# tc actions replace action connmark \
> goto chain 42 index 90 cookie c1a0c1a0
# tc actions show action connmark

had the following output:

Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1

action order 0: connmark zone 0 goto chain 42
index 90 ref 2 bind 1
cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 302 Comm: kworker/0:2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:tcf_action_exec+0xb8/0x100
Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
RSP: 0018:ffff9bea406c3ad0 EFLAGS: 00010246
RAX: 000000002000002a RBX: ffff8c5dfc009f00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9bea406c3a80 RDI: ffff8c5dfb9d6ec0
RBP: ffff9bea406c3b70 R08: ffff8c5dfda222a0 R09: ffffffff90933c3c
R10: 0000000000000000 R11: 0000000092793f7d R12: ffff8c5df48b3c00
R13: ffff8c5df48b3c08 R14: 0000000000000001 R15: ffff8c5dfb9d6e40
FS: 0000000000000000(0000) GS:ffff8c5dfda00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000062e0e006 CR4: 00000000001606f0
Call Trace:
tcf_classify+0x58/0x120
__dev_queue_xmit+0x40a/0x890
? ndisc_next_option+0x50/0x50
? ___neigh_create+0x4d5/0x680
? ip6_finish_output2+0x1b5/0x590
ip6_finish_output2+0x1b5/0x590
? ip6_output+0x68/0x110
ip6_output+0x68/0x110
? nf_hook.constprop.28+0x79/0xc0
ndisc_send_skb+0x248/0x2e0
ndisc_send_ns+0xf8/0x200
? addrconf_dad_work+0x389/0x4b0
addrconf_dad_work+0x389/0x4b0
? __switch_to_asm+0x34/0x70
? process_one_work+0x195/0x380
? addrconf_dad_completed+0x370/0x370
process_one_work+0x195/0x380
worker_thread+0x30/0x390
? process_one_work+0x380/0x380
kthread+0x113/0x130
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40
Modules linked in: act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd cryptd snd glue_helper joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_net net_failover syscopyarea virtio_blk failover virtio_console sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CR2: 0000000000000000

Validating the control action within tcf_connmark_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff c53075ea Wed Mar 20 08:00:05 MDT 2019 Davide Caratti <dcaratti@redhat.com> net/sched: act_connmark: validate the control action inside init()

the following script:

# tc qdisc add dev crash0 clsact
# tc filter add dev crash0 egress matchall \
> action connmark pass index 90
# tc actions replace action connmark \
> goto chain 42 index 90 cookie c1a0c1a0
# tc actions show action connmark

had the following output:

Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1

action order 0: connmark zone 0 goto chain 42
index 90 ref 2 bind 1
cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 302 Comm: kworker/0:2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:tcf_action_exec+0xb8/0x100
Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
RSP: 0018:ffff9bea406c3ad0 EFLAGS: 00010246
RAX: 000000002000002a RBX: ffff8c5dfc009f00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9bea406c3a80 RDI: ffff8c5dfb9d6ec0
RBP: ffff9bea406c3b70 R08: ffff8c5dfda222a0 R09: ffffffff90933c3c
R10: 0000000000000000 R11: 0000000092793f7d R12: ffff8c5df48b3c00
R13: ffff8c5df48b3c08 R14: 0000000000000001 R15: ffff8c5dfb9d6e40
FS: 0000000000000000(0000) GS:ffff8c5dfda00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000062e0e006 CR4: 00000000001606f0
Call Trace:
tcf_classify+0x58/0x120
__dev_queue_xmit+0x40a/0x890
? ndisc_next_option+0x50/0x50
? ___neigh_create+0x4d5/0x680
? ip6_finish_output2+0x1b5/0x590
ip6_finish_output2+0x1b5/0x590
? ip6_output+0x68/0x110
ip6_output+0x68/0x110
? nf_hook.constprop.28+0x79/0xc0
ndisc_send_skb+0x248/0x2e0
ndisc_send_ns+0xf8/0x200
? addrconf_dad_work+0x389/0x4b0
addrconf_dad_work+0x389/0x4b0
? __switch_to_asm+0x34/0x70
? process_one_work+0x195/0x380
? addrconf_dad_completed+0x370/0x370
process_one_work+0x195/0x380
worker_thread+0x30/0x390
? process_one_work+0x380/0x380
kthread+0x113/0x130
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40
Modules linked in: act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd cryptd snd glue_helper joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_net net_failover syscopyarea virtio_blk failover virtio_console sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CR2: 0000000000000000

Validating the control action within tcf_connmark_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff c53075ea Wed Mar 20 08:00:05 MDT 2019 Davide Caratti <dcaratti@redhat.com> net/sched: act_connmark: validate the control action inside init()

the following script:

# tc qdisc add dev crash0 clsact
# tc filter add dev crash0 egress matchall \
> action connmark pass index 90
# tc actions replace action connmark \
> goto chain 42 index 90 cookie c1a0c1a0
# tc actions show action connmark

had the following output:

Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1

action order 0: connmark zone 0 goto chain 42
index 90 ref 2 bind 1
cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 302 Comm: kworker/0:2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:tcf_action_exec+0xb8/0x100
Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
RSP: 0018:ffff9bea406c3ad0 EFLAGS: 00010246
RAX: 000000002000002a RBX: ffff8c5dfc009f00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9bea406c3a80 RDI: ffff8c5dfb9d6ec0
RBP: ffff9bea406c3b70 R08: ffff8c5dfda222a0 R09: ffffffff90933c3c
R10: 0000000000000000 R11: 0000000092793f7d R12: ffff8c5df48b3c00
R13: ffff8c5df48b3c08 R14: 0000000000000001 R15: ffff8c5dfb9d6e40
FS: 0000000000000000(0000) GS:ffff8c5dfda00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000062e0e006 CR4: 00000000001606f0
Call Trace:
tcf_classify+0x58/0x120
__dev_queue_xmit+0x40a/0x890
? ndisc_next_option+0x50/0x50
? ___neigh_create+0x4d5/0x680
? ip6_finish_output2+0x1b5/0x590
ip6_finish_output2+0x1b5/0x590
? ip6_output+0x68/0x110
ip6_output+0x68/0x110
? nf_hook.constprop.28+0x79/0xc0
ndisc_send_skb+0x248/0x2e0
ndisc_send_ns+0xf8/0x200
? addrconf_dad_work+0x389/0x4b0
addrconf_dad_work+0x389/0x4b0
? __switch_to_asm+0x34/0x70
? process_one_work+0x195/0x380
? addrconf_dad_completed+0x370/0x370
process_one_work+0x195/0x380
worker_thread+0x30/0x390
? process_one_work+0x380/0x380
kthread+0x113/0x130
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40
Modules linked in: act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd cryptd snd glue_helper joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_net net_failover syscopyarea virtio_blk failover virtio_console sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CR2: 0000000000000000

Validating the control action within tcf_connmark_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff c53075ea Wed Mar 20 08:00:05 MDT 2019 Davide Caratti <dcaratti@redhat.com> net/sched: act_connmark: validate the control action inside init()

the following script:

# tc qdisc add dev crash0 clsact
# tc filter add dev crash0 egress matchall \
> action connmark pass index 90
# tc actions replace action connmark \
> goto chain 42 index 90 cookie c1a0c1a0
# tc actions show action connmark

had the following output:

Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1

action order 0: connmark zone 0 goto chain 42
index 90 ref 2 bind 1
cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 302 Comm: kworker/0:2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:tcf_action_exec+0xb8/0x100
Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
RSP: 0018:ffff9bea406c3ad0 EFLAGS: 00010246
RAX: 000000002000002a RBX: ffff8c5dfc009f00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9bea406c3a80 RDI: ffff8c5dfb9d6ec0
RBP: ffff9bea406c3b70 R08: ffff8c5dfda222a0 R09: ffffffff90933c3c
R10: 0000000000000000 R11: 0000000092793f7d R12: ffff8c5df48b3c00
R13: ffff8c5df48b3c08 R14: 0000000000000001 R15: ffff8c5dfb9d6e40
FS: 0000000000000000(0000) GS:ffff8c5dfda00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000062e0e006 CR4: 00000000001606f0
Call Trace:
tcf_classify+0x58/0x120
__dev_queue_xmit+0x40a/0x890
? ndisc_next_option+0x50/0x50
? ___neigh_create+0x4d5/0x680
? ip6_finish_output2+0x1b5/0x590
ip6_finish_output2+0x1b5/0x590
? ip6_output+0x68/0x110
ip6_output+0x68/0x110
? nf_hook.constprop.28+0x79/0xc0
ndisc_send_skb+0x248/0x2e0
ndisc_send_ns+0xf8/0x200
? addrconf_dad_work+0x389/0x4b0
addrconf_dad_work+0x389/0x4b0
? __switch_to_asm+0x34/0x70
? process_one_work+0x195/0x380
? addrconf_dad_completed+0x370/0x370
process_one_work+0x195/0x380
worker_thread+0x30/0x390
? process_one_work+0x380/0x380
kthread+0x113/0x130
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40
Modules linked in: act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd cryptd snd glue_helper joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_net net_failover syscopyarea virtio_blk failover virtio_console sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CR2: 0000000000000000

Validating the control action within tcf_connmark_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff c53075ea Wed Mar 20 08:00:05 MDT 2019 Davide Caratti <dcaratti@redhat.com> net/sched: act_connmark: validate the control action inside init()

the following script:

# tc qdisc add dev crash0 clsact
# tc filter add dev crash0 egress matchall \
> action connmark pass index 90
# tc actions replace action connmark \
> goto chain 42 index 90 cookie c1a0c1a0
# tc actions show action connmark

had the following output:

Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1

action order 0: connmark zone 0 goto chain 42
index 90 ref 2 bind 1
cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 302 Comm: kworker/0:2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:tcf_action_exec+0xb8/0x100
Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
RSP: 0018:ffff9bea406c3ad0 EFLAGS: 00010246
RAX: 000000002000002a RBX: ffff8c5dfc009f00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9bea406c3a80 RDI: ffff8c5dfb9d6ec0
RBP: ffff9bea406c3b70 R08: ffff8c5dfda222a0 R09: ffffffff90933c3c
R10: 0000000000000000 R11: 0000000092793f7d R12: ffff8c5df48b3c00
R13: ffff8c5df48b3c08 R14: 0000000000000001 R15: ffff8c5dfb9d6e40
FS: 0000000000000000(0000) GS:ffff8c5dfda00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000062e0e006 CR4: 00000000001606f0
Call Trace:
tcf_classify+0x58/0x120
__dev_queue_xmit+0x40a/0x890
? ndisc_next_option+0x50/0x50
? ___neigh_create+0x4d5/0x680
? ip6_finish_output2+0x1b5/0x590
ip6_finish_output2+0x1b5/0x590
? ip6_output+0x68/0x110
ip6_output+0x68/0x110
? nf_hook.constprop.28+0x79/0xc0
ndisc_send_skb+0x248/0x2e0
ndisc_send_ns+0xf8/0x200
? addrconf_dad_work+0x389/0x4b0
addrconf_dad_work+0x389/0x4b0
? __switch_to_asm+0x34/0x70
? process_one_work+0x195/0x380
? addrconf_dad_completed+0x370/0x370
process_one_work+0x195/0x380
worker_thread+0x30/0x390
? process_one_work+0x380/0x380
kthread+0x113/0x130
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40
Modules linked in: act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd cryptd snd glue_helper joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_net net_failover syscopyarea virtio_blk failover virtio_console sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CR2: 0000000000000000

Validating the control action within tcf_connmark_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff c53075ea Wed Mar 20 08:00:05 MDT 2019 Davide Caratti <dcaratti@redhat.com> net/sched: act_connmark: validate the control action inside init()

the following script:

# tc qdisc add dev crash0 clsact
# tc filter add dev crash0 egress matchall \
> action connmark pass index 90
# tc actions replace action connmark \
> goto chain 42 index 90 cookie c1a0c1a0
# tc actions show action connmark

had the following output:

Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1

action order 0: connmark zone 0 goto chain 42
index 90 ref 2 bind 1
cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 302 Comm: kworker/0:2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:tcf_action_exec+0xb8/0x100
Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
RSP: 0018:ffff9bea406c3ad0 EFLAGS: 00010246
RAX: 000000002000002a RBX: ffff8c5dfc009f00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9bea406c3a80 RDI: ffff8c5dfb9d6ec0
RBP: ffff9bea406c3b70 R08: ffff8c5dfda222a0 R09: ffffffff90933c3c
R10: 0000000000000000 R11: 0000000092793f7d R12: ffff8c5df48b3c00
R13: ffff8c5df48b3c08 R14: 0000000000000001 R15: ffff8c5dfb9d6e40
FS: 0000000000000000(0000) GS:ffff8c5dfda00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000062e0e006 CR4: 00000000001606f0
Call Trace:
tcf_classify+0x58/0x120
__dev_queue_xmit+0x40a/0x890
? ndisc_next_option+0x50/0x50
? ___neigh_create+0x4d5/0x680
? ip6_finish_output2+0x1b5/0x590
ip6_finish_output2+0x1b5/0x590
? ip6_output+0x68/0x110
ip6_output+0x68/0x110
? nf_hook.constprop.28+0x79/0xc0
ndisc_send_skb+0x248/0x2e0
ndisc_send_ns+0xf8/0x200
? addrconf_dad_work+0x389/0x4b0
addrconf_dad_work+0x389/0x4b0
? __switch_to_asm+0x34/0x70
? process_one_work+0x195/0x380
? addrconf_dad_completed+0x370/0x370
process_one_work+0x195/0x380
worker_thread+0x30/0x390
? process_one_work+0x380/0x380
kthread+0x113/0x130
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40
Modules linked in: act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd cryptd snd glue_helper joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_net net_failover syscopyarea virtio_blk failover virtio_console sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CR2: 0000000000000000

Validating the control action within tcf_connmark_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff c53075ea Wed Mar 20 08:00:05 MDT 2019 Davide Caratti <dcaratti@redhat.com> net/sched: act_connmark: validate the control action inside init()

the following script:

# tc qdisc add dev crash0 clsact
# tc filter add dev crash0 egress matchall \
> action connmark pass index 90
# tc actions replace action connmark \
> goto chain 42 index 90 cookie c1a0c1a0
# tc actions show action connmark

had the following output:

Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1

action order 0: connmark zone 0 goto chain 42
index 90 ref 2 bind 1
cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 302 Comm: kworker/0:2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:tcf_action_exec+0xb8/0x100
Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
RSP: 0018:ffff9bea406c3ad0 EFLAGS: 00010246
RAX: 000000002000002a RBX: ffff8c5dfc009f00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9bea406c3a80 RDI: ffff8c5dfb9d6ec0
RBP: ffff9bea406c3b70 R08: ffff8c5dfda222a0 R09: ffffffff90933c3c
R10: 0000000000000000 R11: 0000000092793f7d R12: ffff8c5df48b3c00
R13: ffff8c5df48b3c08 R14: 0000000000000001 R15: ffff8c5dfb9d6e40
FS: 0000000000000000(0000) GS:ffff8c5dfda00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000062e0e006 CR4: 00000000001606f0
Call Trace:
tcf_classify+0x58/0x120
__dev_queue_xmit+0x40a/0x890
? ndisc_next_option+0x50/0x50
? ___neigh_create+0x4d5/0x680
? ip6_finish_output2+0x1b5/0x590
ip6_finish_output2+0x1b5/0x590
? ip6_output+0x68/0x110
ip6_output+0x68/0x110
? nf_hook.constprop.28+0x79/0xc0
ndisc_send_skb+0x248/0x2e0
ndisc_send_ns+0xf8/0x200
? addrconf_dad_work+0x389/0x4b0
addrconf_dad_work+0x389/0x4b0
? __switch_to_asm+0x34/0x70
? process_one_work+0x195/0x380
? addrconf_dad_completed+0x370/0x370
process_one_work+0x195/0x380
worker_thread+0x30/0x390
? process_one_work+0x380/0x380
kthread+0x113/0x130
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40
Modules linked in: act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd cryptd snd glue_helper joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_net net_failover syscopyarea virtio_blk failover virtio_console sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CR2: 0000000000000000

Validating the control action within tcf_connmark_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff c53075ea Wed Mar 20 08:00:05 MDT 2019 Davide Caratti <dcaratti@redhat.com> net/sched: act_connmark: validate the control action inside init()

the following script:

# tc qdisc add dev crash0 clsact
# tc filter add dev crash0 egress matchall \
> action connmark pass index 90
# tc actions replace action connmark \
> goto chain 42 index 90 cookie c1a0c1a0
# tc actions show action connmark

had the following output:

Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1

action order 0: connmark zone 0 goto chain 42
index 90 ref 2 bind 1
cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 302 Comm: kworker/0:2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:tcf_action_exec+0xb8/0x100
Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
RSP: 0018:ffff9bea406c3ad0 EFLAGS: 00010246
RAX: 000000002000002a RBX: ffff8c5dfc009f00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9bea406c3a80 RDI: ffff8c5dfb9d6ec0
RBP: ffff9bea406c3b70 R08: ffff8c5dfda222a0 R09: ffffffff90933c3c
R10: 0000000000000000 R11: 0000000092793f7d R12: ffff8c5df48b3c00
R13: ffff8c5df48b3c08 R14: 0000000000000001 R15: ffff8c5dfb9d6e40
FS: 0000000000000000(0000) GS:ffff8c5dfda00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000062e0e006 CR4: 00000000001606f0
Call Trace:
tcf_classify+0x58/0x120
__dev_queue_xmit+0x40a/0x890
? ndisc_next_option+0x50/0x50
? ___neigh_create+0x4d5/0x680
? ip6_finish_output2+0x1b5/0x590
ip6_finish_output2+0x1b5/0x590
? ip6_output+0x68/0x110
ip6_output+0x68/0x110
? nf_hook.constprop.28+0x79/0xc0
ndisc_send_skb+0x248/0x2e0
ndisc_send_ns+0xf8/0x200
? addrconf_dad_work+0x389/0x4b0
addrconf_dad_work+0x389/0x4b0
? __switch_to_asm+0x34/0x70
? process_one_work+0x195/0x380
? addrconf_dad_completed+0x370/0x370
process_one_work+0x195/0x380
worker_thread+0x30/0x390
? process_one_work+0x380/0x380
kthread+0x113/0x130
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40
Modules linked in: act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd cryptd snd glue_helper joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_net net_failover syscopyarea virtio_blk failover virtio_console sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CR2: 0000000000000000

Validating the control action within tcf_connmark_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff c53075ea Wed Mar 20 08:00:05 MDT 2019 Davide Caratti <dcaratti@redhat.com> net/sched: act_connmark: validate the control action inside init()

the following script:

# tc qdisc add dev crash0 clsact
# tc filter add dev crash0 egress matchall \
> action connmark pass index 90
# tc actions replace action connmark \
> goto chain 42 index 90 cookie c1a0c1a0
# tc actions show action connmark

had the following output:

Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1

action order 0: connmark zone 0 goto chain 42
index 90 ref 2 bind 1
cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 302 Comm: kworker/0:2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:tcf_action_exec+0xb8/0x100
Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
RSP: 0018:ffff9bea406c3ad0 EFLAGS: 00010246
RAX: 000000002000002a RBX: ffff8c5dfc009f00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9bea406c3a80 RDI: ffff8c5dfb9d6ec0
RBP: ffff9bea406c3b70 R08: ffff8c5dfda222a0 R09: ffffffff90933c3c
R10: 0000000000000000 R11: 0000000092793f7d R12: ffff8c5df48b3c00
R13: ffff8c5df48b3c08 R14: 0000000000000001 R15: ffff8c5dfb9d6e40
FS: 0000000000000000(0000) GS:ffff8c5dfda00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000062e0e006 CR4: 00000000001606f0
Call Trace:
tcf_classify+0x58/0x120
__dev_queue_xmit+0x40a/0x890
? ndisc_next_option+0x50/0x50
? ___neigh_create+0x4d5/0x680
? ip6_finish_output2+0x1b5/0x590
ip6_finish_output2+0x1b5/0x590
? ip6_output+0x68/0x110
ip6_output+0x68/0x110
? nf_hook.constprop.28+0x79/0xc0
ndisc_send_skb+0x248/0x2e0
ndisc_send_ns+0xf8/0x200
? addrconf_dad_work+0x389/0x4b0
addrconf_dad_work+0x389/0x4b0
? __switch_to_asm+0x34/0x70
? process_one_work+0x195/0x380
? addrconf_dad_completed+0x370/0x370
process_one_work+0x195/0x380
worker_thread+0x30/0x390
? process_one_work+0x380/0x380
kthread+0x113/0x130
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40
Modules linked in: act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd cryptd snd glue_helper joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_net net_failover syscopyarea virtio_blk failover virtio_console sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CR2: 0000000000000000

Validating the control action within tcf_connmark_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff c53075ea Wed Mar 20 08:00:05 MDT 2019 Davide Caratti <dcaratti@redhat.com> net/sched: act_connmark: validate the control action inside init()

the following script:

# tc qdisc add dev crash0 clsact
# tc filter add dev crash0 egress matchall \
> action connmark pass index 90
# tc actions replace action connmark \
> goto chain 42 index 90 cookie c1a0c1a0
# tc actions show action connmark

had the following output:

Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1

action order 0: connmark zone 0 goto chain 42
index 90 ref 2 bind 1
cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 302 Comm: kworker/0:2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:tcf_action_exec+0xb8/0x100
Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
RSP: 0018:ffff9bea406c3ad0 EFLAGS: 00010246
RAX: 000000002000002a RBX: ffff8c5dfc009f00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9bea406c3a80 RDI: ffff8c5dfb9d6ec0
RBP: ffff9bea406c3b70 R08: ffff8c5dfda222a0 R09: ffffffff90933c3c
R10: 0000000000000000 R11: 0000000092793f7d R12: ffff8c5df48b3c00
R13: ffff8c5df48b3c08 R14: 0000000000000001 R15: ffff8c5dfb9d6e40
FS: 0000000000000000(0000) GS:ffff8c5dfda00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000062e0e006 CR4: 00000000001606f0
Call Trace:
tcf_classify+0x58/0x120
__dev_queue_xmit+0x40a/0x890
? ndisc_next_option+0x50/0x50
? ___neigh_create+0x4d5/0x680
? ip6_finish_output2+0x1b5/0x590
ip6_finish_output2+0x1b5/0x590
? ip6_output+0x68/0x110
ip6_output+0x68/0x110
? nf_hook.constprop.28+0x79/0xc0
ndisc_send_skb+0x248/0x2e0
ndisc_send_ns+0xf8/0x200
? addrconf_dad_work+0x389/0x4b0
addrconf_dad_work+0x389/0x4b0
? __switch_to_asm+0x34/0x70
? process_one_work+0x195/0x380
? addrconf_dad_completed+0x370/0x370
process_one_work+0x195/0x380
worker_thread+0x30/0x390
? process_one_work+0x380/0x380
kthread+0x113/0x130
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40
Modules linked in: act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd cryptd snd glue_helper joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_net net_failover syscopyarea virtio_blk failover virtio_console sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CR2: 0000000000000000

Validating the control action within tcf_connmark_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff c53075ea Wed Mar 20 08:00:05 MDT 2019 Davide Caratti <dcaratti@redhat.com> net/sched: act_connmark: validate the control action inside init()

the following script:

# tc qdisc add dev crash0 clsact
# tc filter add dev crash0 egress matchall \
> action connmark pass index 90
# tc actions replace action connmark \
> goto chain 42 index 90 cookie c1a0c1a0
# tc actions show action connmark

had the following output:

Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1

action order 0: connmark zone 0 goto chain 42
index 90 ref 2 bind 1
cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 302 Comm: kworker/0:2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:tcf_action_exec+0xb8/0x100
Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
RSP: 0018:ffff9bea406c3ad0 EFLAGS: 00010246
RAX: 000000002000002a RBX: ffff8c5dfc009f00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9bea406c3a80 RDI: ffff8c5dfb9d6ec0
RBP: ffff9bea406c3b70 R08: ffff8c5dfda222a0 R09: ffffffff90933c3c
R10: 0000000000000000 R11: 0000000092793f7d R12: ffff8c5df48b3c00
R13: ffff8c5df48b3c08 R14: 0000000000000001 R15: ffff8c5dfb9d6e40
FS: 0000000000000000(0000) GS:ffff8c5dfda00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000062e0e006 CR4: 00000000001606f0
Call Trace:
tcf_classify+0x58/0x120
__dev_queue_xmit+0x40a/0x890
? ndisc_next_option+0x50/0x50
? ___neigh_create+0x4d5/0x680
? ip6_finish_output2+0x1b5/0x590
ip6_finish_output2+0x1b5/0x590
? ip6_output+0x68/0x110
ip6_output+0x68/0x110
? nf_hook.constprop.28+0x79/0xc0
ndisc_send_skb+0x248/0x2e0
ndisc_send_ns+0xf8/0x200
? addrconf_dad_work+0x389/0x4b0
addrconf_dad_work+0x389/0x4b0
? __switch_to_asm+0x34/0x70
? process_one_work+0x195/0x380
? addrconf_dad_completed+0x370/0x370
process_one_work+0x195/0x380
worker_thread+0x30/0x390
? process_one_work+0x380/0x380
kthread+0x113/0x130
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40
Modules linked in: act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd cryptd snd glue_helper joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_net net_failover syscopyarea virtio_blk failover virtio_console sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CR2: 0000000000000000

Validating the control action within tcf_connmark_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff c53075ea Wed Mar 20 08:00:05 MDT 2019 Davide Caratti <dcaratti@redhat.com> net/sched: act_connmark: validate the control action inside init()

the following script:

# tc qdisc add dev crash0 clsact
# tc filter add dev crash0 egress matchall \
> action connmark pass index 90
# tc actions replace action connmark \
> goto chain 42 index 90 cookie c1a0c1a0
# tc actions show action connmark

had the following output:

Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1

action order 0: connmark zone 0 goto chain 42
index 90 ref 2 bind 1
cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 302 Comm: kworker/0:2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:tcf_action_exec+0xb8/0x100
Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
RSP: 0018:ffff9bea406c3ad0 EFLAGS: 00010246
RAX: 000000002000002a RBX: ffff8c5dfc009f00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9bea406c3a80 RDI: ffff8c5dfb9d6ec0
RBP: ffff9bea406c3b70 R08: ffff8c5dfda222a0 R09: ffffffff90933c3c
R10: 0000000000000000 R11: 0000000092793f7d R12: ffff8c5df48b3c00
R13: ffff8c5df48b3c08 R14: 0000000000000001 R15: ffff8c5dfb9d6e40
FS: 0000000000000000(0000) GS:ffff8c5dfda00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000062e0e006 CR4: 00000000001606f0
Call Trace:
tcf_classify+0x58/0x120
__dev_queue_xmit+0x40a/0x890
? ndisc_next_option+0x50/0x50
? ___neigh_create+0x4d5/0x680
? ip6_finish_output2+0x1b5/0x590
ip6_finish_output2+0x1b5/0x590
? ip6_output+0x68/0x110
ip6_output+0x68/0x110
? nf_hook.constprop.28+0x79/0xc0
ndisc_send_skb+0x248/0x2e0
ndisc_send_ns+0xf8/0x200
? addrconf_dad_work+0x389/0x4b0
addrconf_dad_work+0x389/0x4b0
? __switch_to_asm+0x34/0x70
? process_one_work+0x195/0x380
? addrconf_dad_completed+0x370/0x370
process_one_work+0x195/0x380
worker_thread+0x30/0x390
? process_one_work+0x380/0x380
kthread+0x113/0x130
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40
Modules linked in: act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd cryptd snd glue_helper joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_net net_failover syscopyarea virtio_blk failover virtio_console sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CR2: 0000000000000000

Validating the control action within tcf_connmark_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff c53075ea Wed Mar 20 08:00:05 MDT 2019 Davide Caratti <dcaratti@redhat.com> net/sched: act_connmark: validate the control action inside init()

the following script:

# tc qdisc add dev crash0 clsact
# tc filter add dev crash0 egress matchall \
> action connmark pass index 90
# tc actions replace action connmark \
> goto chain 42 index 90 cookie c1a0c1a0
# tc actions show action connmark

had the following output:

Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1

action order 0: connmark zone 0 goto chain 42
index 90 ref 2 bind 1
cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 302 Comm: kworker/0:2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:tcf_action_exec+0xb8/0x100
Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
RSP: 0018:ffff9bea406c3ad0 EFLAGS: 00010246
RAX: 000000002000002a RBX: ffff8c5dfc009f00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9bea406c3a80 RDI: ffff8c5dfb9d6ec0
RBP: ffff9bea406c3b70 R08: ffff8c5dfda222a0 R09: ffffffff90933c3c
R10: 0000000000000000 R11: 0000000092793f7d R12: ffff8c5df48b3c00
R13: ffff8c5df48b3c08 R14: 0000000000000001 R15: ffff8c5dfb9d6e40
FS: 0000000000000000(0000) GS:ffff8c5dfda00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000062e0e006 CR4: 00000000001606f0
Call Trace:
tcf_classify+0x58/0x120
__dev_queue_xmit+0x40a/0x890
? ndisc_next_option+0x50/0x50
? ___neigh_create+0x4d5/0x680
? ip6_finish_output2+0x1b5/0x590
ip6_finish_output2+0x1b5/0x590
? ip6_output+0x68/0x110
ip6_output+0x68/0x110
? nf_hook.constprop.28+0x79/0xc0
ndisc_send_skb+0x248/0x2e0
ndisc_send_ns+0xf8/0x200
? addrconf_dad_work+0x389/0x4b0
addrconf_dad_work+0x389/0x4b0
? __switch_to_asm+0x34/0x70
? process_one_work+0x195/0x380
? addrconf_dad_completed+0x370/0x370
process_one_work+0x195/0x380
worker_thread+0x30/0x390
? process_one_work+0x380/0x380
kthread+0x113/0x130
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40
Modules linked in: act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd cryptd snd glue_helper joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_net net_failover syscopyarea virtio_blk failover virtio_console sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CR2: 0000000000000000

Validating the control action within tcf_connmark_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff c53075ea Wed Mar 20 08:00:05 MDT 2019 Davide Caratti <dcaratti@redhat.com> net/sched: act_connmark: validate the control action inside init()

the following script:

# tc qdisc add dev crash0 clsact
# tc filter add dev crash0 egress matchall \
> action connmark pass index 90
# tc actions replace action connmark \
> goto chain 42 index 90 cookie c1a0c1a0
# tc actions show action connmark

had the following output:

Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1

action order 0: connmark zone 0 goto chain 42
index 90 ref 2 bind 1
cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 302 Comm: kworker/0:2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:tcf_action_exec+0xb8/0x100
Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
RSP: 0018:ffff9bea406c3ad0 EFLAGS: 00010246
RAX: 000000002000002a RBX: ffff8c5dfc009f00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9bea406c3a80 RDI: ffff8c5dfb9d6ec0
RBP: ffff9bea406c3b70 R08: ffff8c5dfda222a0 R09: ffffffff90933c3c
R10: 0000000000000000 R11: 0000000092793f7d R12: ffff8c5df48b3c00
R13: ffff8c5df48b3c08 R14: 0000000000000001 R15: ffff8c5dfb9d6e40
FS: 0000000000000000(0000) GS:ffff8c5dfda00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000062e0e006 CR4: 00000000001606f0
Call Trace:
tcf_classify+0x58/0x120
__dev_queue_xmit+0x40a/0x890
? ndisc_next_option+0x50/0x50
? ___neigh_create+0x4d5/0x680
? ip6_finish_output2+0x1b5/0x590
ip6_finish_output2+0x1b5/0x590
? ip6_output+0x68/0x110
ip6_output+0x68/0x110
? nf_hook.constprop.28+0x79/0xc0
ndisc_send_skb+0x248/0x2e0
ndisc_send_ns+0xf8/0x200
? addrconf_dad_work+0x389/0x4b0
addrconf_dad_work+0x389/0x4b0
? __switch_to_asm+0x34/0x70
? process_one_work+0x195/0x380
? addrconf_dad_completed+0x370/0x370
process_one_work+0x195/0x380
worker_thread+0x30/0x390
? process_one_work+0x380/0x380
kthread+0x113/0x130
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40
Modules linked in: act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd cryptd snd glue_helper joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_net net_failover syscopyarea virtio_blk failover virtio_console sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CR2: 0000000000000000

Validating the control action within tcf_connmark_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff c53075ea Wed Mar 20 08:00:05 MDT 2019 Davide Caratti <dcaratti@redhat.com> net/sched: act_connmark: validate the control action inside init()

the following script:

# tc qdisc add dev crash0 clsact
# tc filter add dev crash0 egress matchall \
> action connmark pass index 90
# tc actions replace action connmark \
> goto chain 42 index 90 cookie c1a0c1a0
# tc actions show action connmark

had the following output:

Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1

action order 0: connmark zone 0 goto chain 42
index 90 ref 2 bind 1
cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 302 Comm: kworker/0:2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:tcf_action_exec+0xb8/0x100
Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
RSP: 0018:ffff9bea406c3ad0 EFLAGS: 00010246
RAX: 000000002000002a RBX: ffff8c5dfc009f00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9bea406c3a80 RDI: ffff8c5dfb9d6ec0
RBP: ffff9bea406c3b70 R08: ffff8c5dfda222a0 R09: ffffffff90933c3c
R10: 0000000000000000 R11: 0000000092793f7d R12: ffff8c5df48b3c00
R13: ffff8c5df48b3c08 R14: 0000000000000001 R15: ffff8c5dfb9d6e40
FS: 0000000000000000(0000) GS:ffff8c5dfda00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000062e0e006 CR4: 00000000001606f0
Call Trace:
tcf_classify+0x58/0x120
__dev_queue_xmit+0x40a/0x890
? ndisc_next_option+0x50/0x50
? ___neigh_create+0x4d5/0x680
? ip6_finish_output2+0x1b5/0x590
ip6_finish_output2+0x1b5/0x590
? ip6_output+0x68/0x110
ip6_output+0x68/0x110
? nf_hook.constprop.28+0x79/0xc0
ndisc_send_skb+0x248/0x2e0
ndisc_send_ns+0xf8/0x200
? addrconf_dad_work+0x389/0x4b0
addrconf_dad_work+0x389/0x4b0
? __switch_to_asm+0x34/0x70
? process_one_work+0x195/0x380
? addrconf_dad_completed+0x370/0x370
process_one_work+0x195/0x380
worker_thread+0x30/0x390
? process_one_work+0x380/0x380
kthread+0x113/0x130
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40
Modules linked in: act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd cryptd snd glue_helper joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_net net_failover syscopyarea virtio_blk failover virtio_console sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CR2: 0000000000000000

Validating the control action within tcf_connmark_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff c53075ea Wed Mar 20 08:00:05 MDT 2019 Davide Caratti <dcaratti@redhat.com> net/sched: act_connmark: validate the control action inside init()

the following script:

# tc qdisc add dev crash0 clsact
# tc filter add dev crash0 egress matchall \
> action connmark pass index 90
# tc actions replace action connmark \
> goto chain 42 index 90 cookie c1a0c1a0
# tc actions show action connmark

had the following output:

Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1

action order 0: connmark zone 0 goto chain 42
index 90 ref 2 bind 1
cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 302 Comm: kworker/0:2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:tcf_action_exec+0xb8/0x100
Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
RSP: 0018:ffff9bea406c3ad0 EFLAGS: 00010246
RAX: 000000002000002a RBX: ffff8c5dfc009f00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9bea406c3a80 RDI: ffff8c5dfb9d6ec0
RBP: ffff9bea406c3b70 R08: ffff8c5dfda222a0 R09: ffffffff90933c3c
R10: 0000000000000000 R11: 0000000092793f7d R12: ffff8c5df48b3c00
R13: ffff8c5df48b3c08 R14: 0000000000000001 R15: ffff8c5dfb9d6e40
FS: 0000000000000000(0000) GS:ffff8c5dfda00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000062e0e006 CR4: 00000000001606f0
Call Trace:
tcf_classify+0x58/0x120
__dev_queue_xmit+0x40a/0x890
? ndisc_next_option+0x50/0x50
? ___neigh_create+0x4d5/0x680
? ip6_finish_output2+0x1b5/0x590
ip6_finish_output2+0x1b5/0x590
? ip6_output+0x68/0x110
ip6_output+0x68/0x110
? nf_hook.constprop.28+0x79/0xc0
ndisc_send_skb+0x248/0x2e0
ndisc_send_ns+0xf8/0x200
? addrconf_dad_work+0x389/0x4b0
addrconf_dad_work+0x389/0x4b0
? __switch_to_asm+0x34/0x70
? process_one_work+0x195/0x380
? addrconf_dad_completed+0x370/0x370
process_one_work+0x195/0x380
worker_thread+0x30/0x390
? process_one_work+0x380/0x380
kthread+0x113/0x130
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40
Modules linked in: act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd cryptd snd glue_helper joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_net net_failover syscopyarea virtio_blk failover virtio_console sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CR2: 0000000000000000

Validating the control action within tcf_connmark_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff c53075ea Wed Mar 20 08:00:05 MDT 2019 Davide Caratti <dcaratti@redhat.com> net/sched: act_connmark: validate the control action inside init()

the following script:

# tc qdisc add dev crash0 clsact
# tc filter add dev crash0 egress matchall \
> action connmark pass index 90
# tc actions replace action connmark \
> goto chain 42 index 90 cookie c1a0c1a0
# tc actions show action connmark

had the following output:

Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1

action order 0: connmark zone 0 goto chain 42
index 90 ref 2 bind 1
cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 302 Comm: kworker/0:2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:tcf_action_exec+0xb8/0x100
Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
RSP: 0018:ffff9bea406c3ad0 EFLAGS: 00010246
RAX: 000000002000002a RBX: ffff8c5dfc009f00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9bea406c3a80 RDI: ffff8c5dfb9d6ec0
RBP: ffff9bea406c3b70 R08: ffff8c5dfda222a0 R09: ffffffff90933c3c
R10: 0000000000000000 R11: 0000000092793f7d R12: ffff8c5df48b3c00
R13: ffff8c5df48b3c08 R14: 0000000000000001 R15: ffff8c5dfb9d6e40
FS: 0000000000000000(0000) GS:ffff8c5dfda00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000062e0e006 CR4: 00000000001606f0
Call Trace:
tcf_classify+0x58/0x120
__dev_queue_xmit+0x40a/0x890
? ndisc_next_option+0x50/0x50
? ___neigh_create+0x4d5/0x680
? ip6_finish_output2+0x1b5/0x590
ip6_finish_output2+0x1b5/0x590
? ip6_output+0x68/0x110
ip6_output+0x68/0x110
? nf_hook.constprop.28+0x79/0xc0
ndisc_send_skb+0x248/0x2e0
ndisc_send_ns+0xf8/0x200
? addrconf_dad_work+0x389/0x4b0
addrconf_dad_work+0x389/0x4b0
? __switch_to_asm+0x34/0x70
? process_one_work+0x195/0x380
? addrconf_dad_completed+0x370/0x370
process_one_work+0x195/0x380
worker_thread+0x30/0x390
? process_one_work+0x380/0x380
kthread+0x113/0x130
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40
Modules linked in: act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd cryptd snd glue_helper joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_net net_failover syscopyarea virtio_blk failover virtio_console sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CR2: 0000000000000000

Validating the control action within tcf_connmark_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff c53075ea Wed Mar 20 08:00:05 MDT 2019 Davide Caratti <dcaratti@redhat.com> net/sched: act_connmark: validate the control action inside init()

the following script:

# tc qdisc add dev crash0 clsact
# tc filter add dev crash0 egress matchall \
> action connmark pass index 90
# tc actions replace action connmark \
> goto chain 42 index 90 cookie c1a0c1a0
# tc actions show action connmark

had the following output:

Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1

action order 0: connmark zone 0 goto chain 42
index 90 ref 2 bind 1
cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 302 Comm: kworker/0:2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:tcf_action_exec+0xb8/0x100
Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
RSP: 0018:ffff9bea406c3ad0 EFLAGS: 00010246
RAX: 000000002000002a RBX: ffff8c5dfc009f00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9bea406c3a80 RDI: ffff8c5dfb9d6ec0
RBP: ffff9bea406c3b70 R08: ffff8c5dfda222a0 R09: ffffffff90933c3c
R10: 0000000000000000 R11: 0000000092793f7d R12: ffff8c5df48b3c00
R13: ffff8c5df48b3c08 R14: 0000000000000001 R15: ffff8c5dfb9d6e40
FS: 0000000000000000(0000) GS:ffff8c5dfda00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000062e0e006 CR4: 00000000001606f0
Call Trace:
tcf_classify+0x58/0x120
__dev_queue_xmit+0x40a/0x890
? ndisc_next_option+0x50/0x50
? ___neigh_create+0x4d5/0x680
? ip6_finish_output2+0x1b5/0x590
ip6_finish_output2+0x1b5/0x590
? ip6_output+0x68/0x110
ip6_output+0x68/0x110
? nf_hook.constprop.28+0x79/0xc0
ndisc_send_skb+0x248/0x2e0
ndisc_send_ns+0xf8/0x200
? addrconf_dad_work+0x389/0x4b0
addrconf_dad_work+0x389/0x4b0
? __switch_to_asm+0x34/0x70
? process_one_work+0x195/0x380
? addrconf_dad_completed+0x370/0x370
process_one_work+0x195/0x380
worker_thread+0x30/0x390
? process_one_work+0x380/0x380
kthread+0x113/0x130
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40
Modules linked in: act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd cryptd snd glue_helper joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_net net_failover syscopyarea virtio_blk failover virtio_console sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CR2: 0000000000000000

Validating the control action within tcf_connmark_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff c53075ea Wed Mar 20 08:00:05 MDT 2019 Davide Caratti <dcaratti@redhat.com> net/sched: act_connmark: validate the control action inside init()

the following script:

# tc qdisc add dev crash0 clsact
# tc filter add dev crash0 egress matchall \
> action connmark pass index 90
# tc actions replace action connmark \
> goto chain 42 index 90 cookie c1a0c1a0
# tc actions show action connmark

had the following output:

Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1

action order 0: connmark zone 0 goto chain 42
index 90 ref 2 bind 1
cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 302 Comm: kworker/0:2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:tcf_action_exec+0xb8/0x100
Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
RSP: 0018:ffff9bea406c3ad0 EFLAGS: 00010246
RAX: 000000002000002a RBX: ffff8c5dfc009f00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9bea406c3a80 RDI: ffff8c5dfb9d6ec0
RBP: ffff9bea406c3b70 R08: ffff8c5dfda222a0 R09: ffffffff90933c3c
R10: 0000000000000000 R11: 0000000092793f7d R12: ffff8c5df48b3c00
R13: ffff8c5df48b3c08 R14: 0000000000000001 R15: ffff8c5dfb9d6e40
FS: 0000000000000000(0000) GS:ffff8c5dfda00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000062e0e006 CR4: 00000000001606f0
Call Trace:
tcf_classify+0x58/0x120
__dev_queue_xmit+0x40a/0x890
? ndisc_next_option+0x50/0x50
? ___neigh_create+0x4d5/0x680
? ip6_finish_output2+0x1b5/0x590
ip6_finish_output2+0x1b5/0x590
? ip6_output+0x68/0x110
ip6_output+0x68/0x110
? nf_hook.constprop.28+0x79/0xc0
ndisc_send_skb+0x248/0x2e0
ndisc_send_ns+0xf8/0x200
? addrconf_dad_work+0x389/0x4b0
addrconf_dad_work+0x389/0x4b0
? __switch_to_asm+0x34/0x70
? process_one_work+0x195/0x380
? addrconf_dad_completed+0x370/0x370
process_one_work+0x195/0x380
worker_thread+0x30/0x390
? process_one_work+0x380/0x380
kthread+0x113/0x130
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40
Modules linked in: act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd cryptd snd glue_helper joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_net net_failover syscopyarea virtio_blk failover virtio_console sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CR2: 0000000000000000

Validating the control action within tcf_connmark_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff c53075ea Wed Mar 20 08:00:05 MDT 2019 Davide Caratti <dcaratti@redhat.com> net/sched: act_connmark: validate the control action inside init()

the following script:

# tc qdisc add dev crash0 clsact
# tc filter add dev crash0 egress matchall \
> action connmark pass index 90
# tc actions replace action connmark \
> goto chain 42 index 90 cookie c1a0c1a0
# tc actions show action connmark

had the following output:

Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1

action order 0: connmark zone 0 goto chain 42
index 90 ref 2 bind 1
cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 302 Comm: kworker/0:2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:tcf_action_exec+0xb8/0x100
Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
RSP: 0018:ffff9bea406c3ad0 EFLAGS: 00010246
RAX: 000000002000002a RBX: ffff8c5dfc009f00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9bea406c3a80 RDI: ffff8c5dfb9d6ec0
RBP: ffff9bea406c3b70 R08: ffff8c5dfda222a0 R09: ffffffff90933c3c
R10: 0000000000000000 R11: 0000000092793f7d R12: ffff8c5df48b3c00
R13: ffff8c5df48b3c08 R14: 0000000000000001 R15: ffff8c5dfb9d6e40
FS: 0000000000000000(0000) GS:ffff8c5dfda00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000062e0e006 CR4: 00000000001606f0
Call Trace:
tcf_classify+0x58/0x120
__dev_queue_xmit+0x40a/0x890
? ndisc_next_option+0x50/0x50
? ___neigh_create+0x4d5/0x680
? ip6_finish_output2+0x1b5/0x590
ip6_finish_output2+0x1b5/0x590
? ip6_output+0x68/0x110
ip6_output+0x68/0x110
? nf_hook.constprop.28+0x79/0xc0
ndisc_send_skb+0x248/0x2e0
ndisc_send_ns+0xf8/0x200
? addrconf_dad_work+0x389/0x4b0
addrconf_dad_work+0x389/0x4b0
? __switch_to_asm+0x34/0x70
? process_one_work+0x195/0x380
? addrconf_dad_completed+0x370/0x370
process_one_work+0x195/0x380
worker_thread+0x30/0x390
? process_one_work+0x380/0x380
kthread+0x113/0x130
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40
Modules linked in: act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd cryptd snd glue_helper joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_net net_failover syscopyarea virtio_blk failover virtio_console sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CR2: 0000000000000000

Validating the control action within tcf_connmark_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff c53075ea Wed Mar 20 08:00:05 MDT 2019 Davide Caratti <dcaratti@redhat.com> net/sched: act_connmark: validate the control action inside init()

the following script:

# tc qdisc add dev crash0 clsact
# tc filter add dev crash0 egress matchall \
> action connmark pass index 90
# tc actions replace action connmark \
> goto chain 42 index 90 cookie c1a0c1a0
# tc actions show action connmark

had the following output:

Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1

action order 0: connmark zone 0 goto chain 42
index 90 ref 2 bind 1
cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 302 Comm: kworker/0:2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:tcf_action_exec+0xb8/0x100
Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
RSP: 0018:ffff9bea406c3ad0 EFLAGS: 00010246
RAX: 000000002000002a RBX: ffff8c5dfc009f00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9bea406c3a80 RDI: ffff8c5dfb9d6ec0
RBP: ffff9bea406c3b70 R08: ffff8c5dfda222a0 R09: ffffffff90933c3c
R10: 0000000000000000 R11: 0000000092793f7d R12: ffff8c5df48b3c00
R13: ffff8c5df48b3c08 R14: 0000000000000001 R15: ffff8c5dfb9d6e40
FS: 0000000000000000(0000) GS:ffff8c5dfda00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000062e0e006 CR4: 00000000001606f0
Call Trace:
tcf_classify+0x58/0x120
__dev_queue_xmit+0x40a/0x890
? ndisc_next_option+0x50/0x50
? ___neigh_create+0x4d5/0x680
? ip6_finish_output2+0x1b5/0x590
ip6_finish_output2+0x1b5/0x590
? ip6_output+0x68/0x110
ip6_output+0x68/0x110
? nf_hook.constprop.28+0x79/0xc0
ndisc_send_skb+0x248/0x2e0
ndisc_send_ns+0xf8/0x200
? addrconf_dad_work+0x389/0x4b0
addrconf_dad_work+0x389/0x4b0
? __switch_to_asm+0x34/0x70
? process_one_work+0x195/0x380
? addrconf_dad_completed+0x370/0x370
process_one_work+0x195/0x380
worker_thread+0x30/0x390
? process_one_work+0x380/0x380
kthread+0x113/0x130
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40
Modules linked in: act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd cryptd snd glue_helper joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_net net_failover syscopyarea virtio_blk failover virtio_console sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CR2: 0000000000000000

Validating the control action within tcf_connmark_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff c53075ea Wed Mar 20 08:00:05 MDT 2019 Davide Caratti <dcaratti@redhat.com> net/sched: act_connmark: validate the control action inside init()

the following script:

# tc qdisc add dev crash0 clsact
# tc filter add dev crash0 egress matchall \
> action connmark pass index 90
# tc actions replace action connmark \
> goto chain 42 index 90 cookie c1a0c1a0
# tc actions show action connmark

had the following output:

Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1

action order 0: connmark zone 0 goto chain 42
index 90 ref 2 bind 1
cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 302 Comm: kworker/0:2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:tcf_action_exec+0xb8/0x100
Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
RSP: 0018:ffff9bea406c3ad0 EFLAGS: 00010246
RAX: 000000002000002a RBX: ffff8c5dfc009f00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9bea406c3a80 RDI: ffff8c5dfb9d6ec0
RBP: ffff9bea406c3b70 R08: ffff8c5dfda222a0 R09: ffffffff90933c3c
R10: 0000000000000000 R11: 0000000092793f7d R12: ffff8c5df48b3c00
R13: ffff8c5df48b3c08 R14: 0000000000000001 R15: ffff8c5dfb9d6e40
FS: 0000000000000000(0000) GS:ffff8c5dfda00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000062e0e006 CR4: 00000000001606f0
Call Trace:
tcf_classify+0x58/0x120
__dev_queue_xmit+0x40a/0x890
? ndisc_next_option+0x50/0x50
? ___neigh_create+0x4d5/0x680
? ip6_finish_output2+0x1b5/0x590
ip6_finish_output2+0x1b5/0x590
? ip6_output+0x68/0x110
ip6_output+0x68/0x110
? nf_hook.constprop.28+0x79/0xc0
ndisc_send_skb+0x248/0x2e0
ndisc_send_ns+0xf8/0x200
? addrconf_dad_work+0x389/0x4b0
addrconf_dad_work+0x389/0x4b0
? __switch_to_asm+0x34/0x70
? process_one_work+0x195/0x380
? addrconf_dad_completed+0x370/0x370
process_one_work+0x195/0x380
worker_thread+0x30/0x390
? process_one_work+0x380/0x380
kthread+0x113/0x130
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40
Modules linked in: act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd cryptd snd glue_helper joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_net net_failover syscopyarea virtio_blk failover virtio_console sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CR2: 0000000000000000

Validating the control action within tcf_connmark_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff c53075ea Wed Mar 20 08:00:05 MDT 2019 Davide Caratti <dcaratti@redhat.com> net/sched: act_connmark: validate the control action inside init()

the following script:

# tc qdisc add dev crash0 clsact
# tc filter add dev crash0 egress matchall \
> action connmark pass index 90
# tc actions replace action connmark \
> goto chain 42 index 90 cookie c1a0c1a0
# tc actions show action connmark

had the following output:

Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1

action order 0: connmark zone 0 goto chain 42
index 90 ref 2 bind 1
cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 302 Comm: kworker/0:2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:tcf_action_exec+0xb8/0x100
Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
RSP: 0018:ffff9bea406c3ad0 EFLAGS: 00010246
RAX: 000000002000002a RBX: ffff8c5dfc009f00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9bea406c3a80 RDI: ffff8c5dfb9d6ec0
RBP: ffff9bea406c3b70 R08: ffff8c5dfda222a0 R09: ffffffff90933c3c
R10: 0000000000000000 R11: 0000000092793f7d R12: ffff8c5df48b3c00
R13: ffff8c5df48b3c08 R14: 0000000000000001 R15: ffff8c5dfb9d6e40
FS: 0000000000000000(0000) GS:ffff8c5dfda00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000062e0e006 CR4: 00000000001606f0
Call Trace:
tcf_classify+0x58/0x120
__dev_queue_xmit+0x40a/0x890
? ndisc_next_option+0x50/0x50
? ___neigh_create+0x4d5/0x680
? ip6_finish_output2+0x1b5/0x590
ip6_finish_output2+0x1b5/0x590
? ip6_output+0x68/0x110
ip6_output+0x68/0x110
? nf_hook.constprop.28+0x79/0xc0
ndisc_send_skb+0x248/0x2e0
ndisc_send_ns+0xf8/0x200
? addrconf_dad_work+0x389/0x4b0
addrconf_dad_work+0x389/0x4b0
? __switch_to_asm+0x34/0x70
? process_one_work+0x195/0x380
? addrconf_dad_completed+0x370/0x370
process_one_work+0x195/0x380
worker_thread+0x30/0x390
? process_one_work+0x380/0x380
kthread+0x113/0x130
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40
Modules linked in: act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd cryptd snd glue_helper joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_net net_failover syscopyarea virtio_blk failover virtio_console sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CR2: 0000000000000000

Validating the control action within tcf_connmark_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff c53075ea Wed Mar 20 08:00:05 MDT 2019 Davide Caratti <dcaratti@redhat.com> net/sched: act_connmark: validate the control action inside init()

the following script:

# tc qdisc add dev crash0 clsact
# tc filter add dev crash0 egress matchall \
> action connmark pass index 90
# tc actions replace action connmark \
> goto chain 42 index 90 cookie c1a0c1a0
# tc actions show action connmark

had the following output:

Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1

action order 0: connmark zone 0 goto chain 42
index 90 ref 2 bind 1
cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 302 Comm: kworker/0:2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:tcf_action_exec+0xb8/0x100
Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
RSP: 0018:ffff9bea406c3ad0 EFLAGS: 00010246
RAX: 000000002000002a RBX: ffff8c5dfc009f00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9bea406c3a80 RDI: ffff8c5dfb9d6ec0
RBP: ffff9bea406c3b70 R08: ffff8c5dfda222a0 R09: ffffffff90933c3c
R10: 0000000000000000 R11: 0000000092793f7d R12: ffff8c5df48b3c00
R13: ffff8c5df48b3c08 R14: 0000000000000001 R15: ffff8c5dfb9d6e40
FS: 0000000000000000(0000) GS:ffff8c5dfda00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000062e0e006 CR4: 00000000001606f0
Call Trace:
tcf_classify+0x58/0x120
__dev_queue_xmit+0x40a/0x890
? ndisc_next_option+0x50/0x50
? ___neigh_create+0x4d5/0x680
? ip6_finish_output2+0x1b5/0x590
ip6_finish_output2+0x1b5/0x590
? ip6_output+0x68/0x110
ip6_output+0x68/0x110
? nf_hook.constprop.28+0x79/0xc0
ndisc_send_skb+0x248/0x2e0
ndisc_send_ns+0xf8/0x200
? addrconf_dad_work+0x389/0x4b0
addrconf_dad_work+0x389/0x4b0
? __switch_to_asm+0x34/0x70
? process_one_work+0x195/0x380
? addrconf_dad_completed+0x370/0x370
process_one_work+0x195/0x380
worker_thread+0x30/0x390
? process_one_work+0x380/0x380
kthread+0x113/0x130
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40
Modules linked in: act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd cryptd snd glue_helper joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_net net_failover syscopyarea virtio_blk failover virtio_console sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CR2: 0000000000000000

Validating the control action within tcf_connmark_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff c53075ea Wed Mar 20 08:00:05 MDT 2019 Davide Caratti <dcaratti@redhat.com> net/sched: act_connmark: validate the control action inside init()

the following script:

# tc qdisc add dev crash0 clsact
# tc filter add dev crash0 egress matchall \
> action connmark pass index 90
# tc actions replace action connmark \
> goto chain 42 index 90 cookie c1a0c1a0
# tc actions show action connmark

had the following output:

Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1

action order 0: connmark zone 0 goto chain 42
index 90 ref 2 bind 1
cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 302 Comm: kworker/0:2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:tcf_action_exec+0xb8/0x100
Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
RSP: 0018:ffff9bea406c3ad0 EFLAGS: 00010246
RAX: 000000002000002a RBX: ffff8c5dfc009f00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9bea406c3a80 RDI: ffff8c5dfb9d6ec0
RBP: ffff9bea406c3b70 R08: ffff8c5dfda222a0 R09: ffffffff90933c3c
R10: 0000000000000000 R11: 0000000092793f7d R12: ffff8c5df48b3c00
R13: ffff8c5df48b3c08 R14: 0000000000000001 R15: ffff8c5dfb9d6e40
FS: 0000000000000000(0000) GS:ffff8c5dfda00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000062e0e006 CR4: 00000000001606f0
Call Trace:
tcf_classify+0x58/0x120
__dev_queue_xmit+0x40a/0x890
? ndisc_next_option+0x50/0x50
? ___neigh_create+0x4d5/0x680
? ip6_finish_output2+0x1b5/0x590
ip6_finish_output2+0x1b5/0x590
? ip6_output+0x68/0x110
ip6_output+0x68/0x110
? nf_hook.constprop.28+0x79/0xc0
ndisc_send_skb+0x248/0x2e0
ndisc_send_ns+0xf8/0x200
? addrconf_dad_work+0x389/0x4b0
addrconf_dad_work+0x389/0x4b0
? __switch_to_asm+0x34/0x70
? process_one_work+0x195/0x380
? addrconf_dad_completed+0x370/0x370
process_one_work+0x195/0x380
worker_thread+0x30/0x390
? process_one_work+0x380/0x380
kthread+0x113/0x130
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40
Modules linked in: act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd cryptd snd glue_helper joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_net net_failover syscopyarea virtio_blk failover virtio_console sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CR2: 0000000000000000

Validating the control action within tcf_connmark_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff c53075ea Wed Mar 20 08:00:05 MDT 2019 Davide Caratti <dcaratti@redhat.com> net/sched: act_connmark: validate the control action inside init()

the following script:

# tc qdisc add dev crash0 clsact
# tc filter add dev crash0 egress matchall \
> action connmark pass index 90
# tc actions replace action connmark \
> goto chain 42 index 90 cookie c1a0c1a0
# tc actions show action connmark

had the following output:

Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1

action order 0: connmark zone 0 goto chain 42
index 90 ref 2 bind 1
cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 302 Comm: kworker/0:2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:tcf_action_exec+0xb8/0x100
Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
RSP: 0018:ffff9bea406c3ad0 EFLAGS: 00010246
RAX: 000000002000002a RBX: ffff8c5dfc009f00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9bea406c3a80 RDI: ffff8c5dfb9d6ec0
RBP: ffff9bea406c3b70 R08: ffff8c5dfda222a0 R09: ffffffff90933c3c
R10: 0000000000000000 R11: 0000000092793f7d R12: ffff8c5df48b3c00
R13: ffff8c5df48b3c08 R14: 0000000000000001 R15: ffff8c5dfb9d6e40
FS: 0000000000000000(0000) GS:ffff8c5dfda00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000062e0e006 CR4: 00000000001606f0
Call Trace:
tcf_classify+0x58/0x120
__dev_queue_xmit+0x40a/0x890
? ndisc_next_option+0x50/0x50
? ___neigh_create+0x4d5/0x680
? ip6_finish_output2+0x1b5/0x590
ip6_finish_output2+0x1b5/0x590
? ip6_output+0x68/0x110
ip6_output+0x68/0x110
? nf_hook.constprop.28+0x79/0xc0
ndisc_send_skb+0x248/0x2e0
ndisc_send_ns+0xf8/0x200
? addrconf_dad_work+0x389/0x4b0
addrconf_dad_work+0x389/0x4b0
? __switch_to_asm+0x34/0x70
? process_one_work+0x195/0x380
? addrconf_dad_completed+0x370/0x370
process_one_work+0x195/0x380
worker_thread+0x30/0x390
? process_one_work+0x380/0x380
kthread+0x113/0x130
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40
Modules linked in: act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd cryptd snd glue_helper joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_net net_failover syscopyarea virtio_blk failover virtio_console sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CR2: 0000000000000000

Validating the control action within tcf_connmark_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff c53075ea Wed Mar 20 08:00:05 MDT 2019 Davide Caratti <dcaratti@redhat.com> net/sched: act_connmark: validate the control action inside init()

the following script:

# tc qdisc add dev crash0 clsact
# tc filter add dev crash0 egress matchall \
> action connmark pass index 90
# tc actions replace action connmark \
> goto chain 42 index 90 cookie c1a0c1a0
# tc actions show action connmark

had the following output:

Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1

action order 0: connmark zone 0 goto chain 42
index 90 ref 2 bind 1
cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 302 Comm: kworker/0:2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:tcf_action_exec+0xb8/0x100
Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
RSP: 0018:ffff9bea406c3ad0 EFLAGS: 00010246
RAX: 000000002000002a RBX: ffff8c5dfc009f00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9bea406c3a80 RDI: ffff8c5dfb9d6ec0
RBP: ffff9bea406c3b70 R08: ffff8c5dfda222a0 R09: ffffffff90933c3c
R10: 0000000000000000 R11: 0000000092793f7d R12: ffff8c5df48b3c00
R13: ffff8c5df48b3c08 R14: 0000000000000001 R15: ffff8c5dfb9d6e40
FS: 0000000000000000(0000) GS:ffff8c5dfda00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000062e0e006 CR4: 00000000001606f0
Call Trace:
tcf_classify+0x58/0x120
__dev_queue_xmit+0x40a/0x890
? ndisc_next_option+0x50/0x50
? ___neigh_create+0x4d5/0x680
? ip6_finish_output2+0x1b5/0x590
ip6_finish_output2+0x1b5/0x590
? ip6_output+0x68/0x110
ip6_output+0x68/0x110
? nf_hook.constprop.28+0x79/0xc0
ndisc_send_skb+0x248/0x2e0
ndisc_send_ns+0xf8/0x200
? addrconf_dad_work+0x389/0x4b0
addrconf_dad_work+0x389/0x4b0
? __switch_to_asm+0x34/0x70
? process_one_work+0x195/0x380
? addrconf_dad_completed+0x370/0x370
process_one_work+0x195/0x380
worker_thread+0x30/0x390
? process_one_work+0x380/0x380
kthread+0x113/0x130
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40
Modules linked in: act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd cryptd snd glue_helper joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_net net_failover syscopyarea virtio_blk failover virtio_console sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CR2: 0000000000000000

Validating the control action within tcf_connmark_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff c53075ea Wed Mar 20 08:00:05 MDT 2019 Davide Caratti <dcaratti@redhat.com> net/sched: act_connmark: validate the control action inside init()

the following script:

# tc qdisc add dev crash0 clsact
# tc filter add dev crash0 egress matchall \
> action connmark pass index 90
# tc actions replace action connmark \
> goto chain 42 index 90 cookie c1a0c1a0
# tc actions show action connmark

had the following output:

Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1

action order 0: connmark zone 0 goto chain 42
index 90 ref 2 bind 1
cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 302 Comm: kworker/0:2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:tcf_action_exec+0xb8/0x100
Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
RSP: 0018:ffff9bea406c3ad0 EFLAGS: 00010246
RAX: 000000002000002a RBX: ffff8c5dfc009f00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9bea406c3a80 RDI: ffff8c5dfb9d6ec0
RBP: ffff9bea406c3b70 R08: ffff8c5dfda222a0 R09: ffffffff90933c3c
R10: 0000000000000000 R11: 0000000092793f7d R12: ffff8c5df48b3c00
R13: ffff8c5df48b3c08 R14: 0000000000000001 R15: ffff8c5dfb9d6e40
FS: 0000000000000000(0000) GS:ffff8c5dfda00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000062e0e006 CR4: 00000000001606f0
Call Trace:
tcf_classify+0x58/0x120
__dev_queue_xmit+0x40a/0x890
? ndisc_next_option+0x50/0x50
? ___neigh_create+0x4d5/0x680
? ip6_finish_output2+0x1b5/0x590
ip6_finish_output2+0x1b5/0x590
? ip6_output+0x68/0x110
ip6_output+0x68/0x110
? nf_hook.constprop.28+0x79/0xc0
ndisc_send_skb+0x248/0x2e0
ndisc_send_ns+0xf8/0x200
? addrconf_dad_work+0x389/0x4b0
addrconf_dad_work+0x389/0x4b0
? __switch_to_asm+0x34/0x70
? process_one_work+0x195/0x380
? addrconf_dad_completed+0x370/0x370
process_one_work+0x195/0x380
worker_thread+0x30/0x390
? process_one_work+0x380/0x380
kthread+0x113/0x130
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40
Modules linked in: act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd cryptd snd glue_helper joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_net net_failover syscopyarea virtio_blk failover virtio_console sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CR2: 0000000000000000

Validating the control action within tcf_connmark_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff c53075ea Wed Mar 20 08:00:05 MDT 2019 Davide Caratti <dcaratti@redhat.com> net/sched: act_connmark: validate the control action inside init()

the following script:

# tc qdisc add dev crash0 clsact
# tc filter add dev crash0 egress matchall \
> action connmark pass index 90
# tc actions replace action connmark \
> goto chain 42 index 90 cookie c1a0c1a0
# tc actions show action connmark

had the following output:

Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1

action order 0: connmark zone 0 goto chain 42
index 90 ref 2 bind 1
cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 302 Comm: kworker/0:2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:tcf_action_exec+0xb8/0x100
Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
RSP: 0018:ffff9bea406c3ad0 EFLAGS: 00010246
RAX: 000000002000002a RBX: ffff8c5dfc009f00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9bea406c3a80 RDI: ffff8c5dfb9d6ec0
RBP: ffff9bea406c3b70 R08: ffff8c5dfda222a0 R09: ffffffff90933c3c
R10: 0000000000000000 R11: 0000000092793f7d R12: ffff8c5df48b3c00
R13: ffff8c5df48b3c08 R14: 0000000000000001 R15: ffff8c5dfb9d6e40
FS: 0000000000000000(0000) GS:ffff8c5dfda00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000062e0e006 CR4: 00000000001606f0
Call Trace:
tcf_classify+0x58/0x120
__dev_queue_xmit+0x40a/0x890
? ndisc_next_option+0x50/0x50
? ___neigh_create+0x4d5/0x680
? ip6_finish_output2+0x1b5/0x590
ip6_finish_output2+0x1b5/0x590
? ip6_output+0x68/0x110
ip6_output+0x68/0x110
? nf_hook.constprop.28+0x79/0xc0
ndisc_send_skb+0x248/0x2e0
ndisc_send_ns+0xf8/0x200
? addrconf_dad_work+0x389/0x4b0
addrconf_dad_work+0x389/0x4b0
? __switch_to_asm+0x34/0x70
? process_one_work+0x195/0x380
? addrconf_dad_completed+0x370/0x370
process_one_work+0x195/0x380
worker_thread+0x30/0x390
? process_one_work+0x380/0x380
kthread+0x113/0x130
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40
Modules linked in: act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd cryptd snd glue_helper joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_net net_failover syscopyarea virtio_blk failover virtio_console sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CR2: 0000000000000000

Validating the control action within tcf_connmark_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff c53075ea Wed Mar 20 08:00:05 MDT 2019 Davide Caratti <dcaratti@redhat.com> net/sched: act_connmark: validate the control action inside init()

the following script:

# tc qdisc add dev crash0 clsact
# tc filter add dev crash0 egress matchall \
> action connmark pass index 90
# tc actions replace action connmark \
> goto chain 42 index 90 cookie c1a0c1a0
# tc actions show action connmark

had the following output:

Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1

action order 0: connmark zone 0 goto chain 42
index 90 ref 2 bind 1
cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 302 Comm: kworker/0:2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:tcf_action_exec+0xb8/0x100
Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
RSP: 0018:ffff9bea406c3ad0 EFLAGS: 00010246
RAX: 000000002000002a RBX: ffff8c5dfc009f00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9bea406c3a80 RDI: ffff8c5dfb9d6ec0
RBP: ffff9bea406c3b70 R08: ffff8c5dfda222a0 R09: ffffffff90933c3c
R10: 0000000000000000 R11: 0000000092793f7d R12: ffff8c5df48b3c00
R13: ffff8c5df48b3c08 R14: 0000000000000001 R15: ffff8c5dfb9d6e40
FS: 0000000000000000(0000) GS:ffff8c5dfda00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000062e0e006 CR4: 00000000001606f0
Call Trace:
tcf_classify+0x58/0x120
__dev_queue_xmit+0x40a/0x890
? ndisc_next_option+0x50/0x50
? ___neigh_create+0x4d5/0x680
? ip6_finish_output2+0x1b5/0x590
ip6_finish_output2+0x1b5/0x590
? ip6_output+0x68/0x110
ip6_output+0x68/0x110
? nf_hook.constprop.28+0x79/0xc0
ndisc_send_skb+0x248/0x2e0
ndisc_send_ns+0xf8/0x200
? addrconf_dad_work+0x389/0x4b0
addrconf_dad_work+0x389/0x4b0
? __switch_to_asm+0x34/0x70
? process_one_work+0x195/0x380
? addrconf_dad_completed+0x370/0x370
process_one_work+0x195/0x380
worker_thread+0x30/0x390
? process_one_work+0x380/0x380
kthread+0x113/0x130
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40
Modules linked in: act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd cryptd snd glue_helper joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_net net_failover syscopyarea virtio_blk failover virtio_console sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CR2: 0000000000000000

Validating the control action within tcf_connmark_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff c53075ea Wed Mar 20 08:00:05 MDT 2019 Davide Caratti <dcaratti@redhat.com> net/sched: act_connmark: validate the control action inside init()

the following script:

# tc qdisc add dev crash0 clsact
# tc filter add dev crash0 egress matchall \
> action connmark pass index 90
# tc actions replace action connmark \
> goto chain 42 index 90 cookie c1a0c1a0
# tc actions show action connmark

had the following output:

Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1

action order 0: connmark zone 0 goto chain 42
index 90 ref 2 bind 1
cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 302 Comm: kworker/0:2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:tcf_action_exec+0xb8/0x100
Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
RSP: 0018:ffff9bea406c3ad0 EFLAGS: 00010246
RAX: 000000002000002a RBX: ffff8c5dfc009f00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9bea406c3a80 RDI: ffff8c5dfb9d6ec0
RBP: ffff9bea406c3b70 R08: ffff8c5dfda222a0 R09: ffffffff90933c3c
R10: 0000000000000000 R11: 0000000092793f7d R12: ffff8c5df48b3c00
R13: ffff8c5df48b3c08 R14: 0000000000000001 R15: ffff8c5dfb9d6e40
FS: 0000000000000000(0000) GS:ffff8c5dfda00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000062e0e006 CR4: 00000000001606f0
Call Trace:
tcf_classify+0x58/0x120
__dev_queue_xmit+0x40a/0x890
? ndisc_next_option+0x50/0x50
? ___neigh_create+0x4d5/0x680
? ip6_finish_output2+0x1b5/0x590
ip6_finish_output2+0x1b5/0x590
? ip6_output+0x68/0x110
ip6_output+0x68/0x110
? nf_hook.constprop.28+0x79/0xc0
ndisc_send_skb+0x248/0x2e0
ndisc_send_ns+0xf8/0x200
? addrconf_dad_work+0x389/0x4b0
addrconf_dad_work+0x389/0x4b0
? __switch_to_asm+0x34/0x70
? process_one_work+0x195/0x380
? addrconf_dad_completed+0x370/0x370
process_one_work+0x195/0x380
worker_thread+0x30/0x390
? process_one_work+0x380/0x380
kthread+0x113/0x130
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40
Modules linked in: act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd cryptd snd glue_helper joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_net net_failover syscopyarea virtio_blk failover virtio_console sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CR2: 0000000000000000

Validating the control action within tcf_connmark_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff c53075ea Wed Mar 20 08:00:05 MDT 2019 Davide Caratti <dcaratti@redhat.com> net/sched: act_connmark: validate the control action inside init()

the following script:

# tc qdisc add dev crash0 clsact
# tc filter add dev crash0 egress matchall \
> action connmark pass index 90
# tc actions replace action connmark \
> goto chain 42 index 90 cookie c1a0c1a0
# tc actions show action connmark

had the following output:

Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1

action order 0: connmark zone 0 goto chain 42
index 90 ref 2 bind 1
cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 302 Comm: kworker/0:2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:tcf_action_exec+0xb8/0x100
Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
RSP: 0018:ffff9bea406c3ad0 EFLAGS: 00010246
RAX: 000000002000002a RBX: ffff8c5dfc009f00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9bea406c3a80 RDI: ffff8c5dfb9d6ec0
RBP: ffff9bea406c3b70 R08: ffff8c5dfda222a0 R09: ffffffff90933c3c
R10: 0000000000000000 R11: 0000000092793f7d R12: ffff8c5df48b3c00
R13: ffff8c5df48b3c08 R14: 0000000000000001 R15: ffff8c5dfb9d6e40
FS: 0000000000000000(0000) GS:ffff8c5dfda00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000062e0e006 CR4: 00000000001606f0
Call Trace:
tcf_classify+0x58/0x120
__dev_queue_xmit+0x40a/0x890
? ndisc_next_option+0x50/0x50
? ___neigh_create+0x4d5/0x680
? ip6_finish_output2+0x1b5/0x590
ip6_finish_output2+0x1b5/0x590
? ip6_output+0x68/0x110
ip6_output+0x68/0x110
? nf_hook.constprop.28+0x79/0xc0
ndisc_send_skb+0x248/0x2e0
ndisc_send_ns+0xf8/0x200
? addrconf_dad_work+0x389/0x4b0
addrconf_dad_work+0x389/0x4b0
? __switch_to_asm+0x34/0x70
? process_one_work+0x195/0x380
? addrconf_dad_completed+0x370/0x370
process_one_work+0x195/0x380
worker_thread+0x30/0x390
? process_one_work+0x380/0x380
kthread+0x113/0x130
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40
Modules linked in: act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd cryptd snd glue_helper joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_net net_failover syscopyarea virtio_blk failover virtio_console sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CR2: 0000000000000000

Validating the control action within tcf_connmark_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff c53075ea Wed Mar 20 08:00:05 MDT 2019 Davide Caratti <dcaratti@redhat.com> net/sched: act_connmark: validate the control action inside init()

the following script:

# tc qdisc add dev crash0 clsact
# tc filter add dev crash0 egress matchall \
> action connmark pass index 90
# tc actions replace action connmark \
> goto chain 42 index 90 cookie c1a0c1a0
# tc actions show action connmark

had the following output:

Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1

action order 0: connmark zone 0 goto chain 42
index 90 ref 2 bind 1
cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 302 Comm: kworker/0:2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:tcf_action_exec+0xb8/0x100
Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
RSP: 0018:ffff9bea406c3ad0 EFLAGS: 00010246
RAX: 000000002000002a RBX: ffff8c5dfc009f00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9bea406c3a80 RDI: ffff8c5dfb9d6ec0
RBP: ffff9bea406c3b70 R08: ffff8c5dfda222a0 R09: ffffffff90933c3c
R10: 0000000000000000 R11: 0000000092793f7d R12: ffff8c5df48b3c00
R13: ffff8c5df48b3c08 R14: 0000000000000001 R15: ffff8c5dfb9d6e40
FS: 0000000000000000(0000) GS:ffff8c5dfda00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000062e0e006 CR4: 00000000001606f0
Call Trace:
tcf_classify+0x58/0x120
__dev_queue_xmit+0x40a/0x890
? ndisc_next_option+0x50/0x50
? ___neigh_create+0x4d5/0x680
? ip6_finish_output2+0x1b5/0x590
ip6_finish_output2+0x1b5/0x590
? ip6_output+0x68/0x110
ip6_output+0x68/0x110
? nf_hook.constprop.28+0x79/0xc0
ndisc_send_skb+0x248/0x2e0
ndisc_send_ns+0xf8/0x200
? addrconf_dad_work+0x389/0x4b0
addrconf_dad_work+0x389/0x4b0
? __switch_to_asm+0x34/0x70
? process_one_work+0x195/0x380
? addrconf_dad_completed+0x370/0x370
process_one_work+0x195/0x380
worker_thread+0x30/0x390
? process_one_work+0x380/0x380
kthread+0x113/0x130
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40
Modules linked in: act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd cryptd snd glue_helper joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_net net_failover syscopyarea virtio_blk failover virtio_console sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CR2: 0000000000000000

Validating the control action within tcf_connmark_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff c53075ea Wed Mar 20 08:00:05 MDT 2019 Davide Caratti <dcaratti@redhat.com> net/sched: act_connmark: validate the control action inside init()

the following script:

# tc qdisc add dev crash0 clsact
# tc filter add dev crash0 egress matchall \
> action connmark pass index 90
# tc actions replace action connmark \
> goto chain 42 index 90 cookie c1a0c1a0
# tc actions show action connmark

had the following output:

Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1

action order 0: connmark zone 0 goto chain 42
index 90 ref 2 bind 1
cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 302 Comm: kworker/0:2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:tcf_action_exec+0xb8/0x100
Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
RSP: 0018:ffff9bea406c3ad0 EFLAGS: 00010246
RAX: 000000002000002a RBX: ffff8c5dfc009f00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9bea406c3a80 RDI: ffff8c5dfb9d6ec0
RBP: ffff9bea406c3b70 R08: ffff8c5dfda222a0 R09: ffffffff90933c3c
R10: 0000000000000000 R11: 0000000092793f7d R12: ffff8c5df48b3c00
R13: ffff8c5df48b3c08 R14: 0000000000000001 R15: ffff8c5dfb9d6e40
FS: 0000000000000000(0000) GS:ffff8c5dfda00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000062e0e006 CR4: 00000000001606f0
Call Trace:
tcf_classify+0x58/0x120
__dev_queue_xmit+0x40a/0x890
? ndisc_next_option+0x50/0x50
? ___neigh_create+0x4d5/0x680
? ip6_finish_output2+0x1b5/0x590
ip6_finish_output2+0x1b5/0x590
? ip6_output+0x68/0x110
ip6_output+0x68/0x110
? nf_hook.constprop.28+0x79/0xc0
ndisc_send_skb+0x248/0x2e0
ndisc_send_ns+0xf8/0x200
? addrconf_dad_work+0x389/0x4b0
addrconf_dad_work+0x389/0x4b0
? __switch_to_asm+0x34/0x70
? process_one_work+0x195/0x380
? addrconf_dad_completed+0x370/0x370
process_one_work+0x195/0x380
worker_thread+0x30/0x390
? process_one_work+0x380/0x380
kthread+0x113/0x130
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40
Modules linked in: act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd cryptd snd glue_helper joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_net net_failover syscopyarea virtio_blk failover virtio_console sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CR2: 0000000000000000

Validating the control action within tcf_connmark_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff c53075ea Wed Mar 20 08:00:05 MDT 2019 Davide Caratti <dcaratti@redhat.com> net/sched: act_connmark: validate the control action inside init()

the following script:

# tc qdisc add dev crash0 clsact
# tc filter add dev crash0 egress matchall \
> action connmark pass index 90
# tc actions replace action connmark \
> goto chain 42 index 90 cookie c1a0c1a0
# tc actions show action connmark

had the following output:

Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1

action order 0: connmark zone 0 goto chain 42
index 90 ref 2 bind 1
cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 302 Comm: kworker/0:2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:tcf_action_exec+0xb8/0x100
Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
RSP: 0018:ffff9bea406c3ad0 EFLAGS: 00010246
RAX: 000000002000002a RBX: ffff8c5dfc009f00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9bea406c3a80 RDI: ffff8c5dfb9d6ec0
RBP: ffff9bea406c3b70 R08: ffff8c5dfda222a0 R09: ffffffff90933c3c
R10: 0000000000000000 R11: 0000000092793f7d R12: ffff8c5df48b3c00
R13: ffff8c5df48b3c08 R14: 0000000000000001 R15: ffff8c5dfb9d6e40
FS: 0000000000000000(0000) GS:ffff8c5dfda00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000062e0e006 CR4: 00000000001606f0
Call Trace:
tcf_classify+0x58/0x120
__dev_queue_xmit+0x40a/0x890
? ndisc_next_option+0x50/0x50
? ___neigh_create+0x4d5/0x680
? ip6_finish_output2+0x1b5/0x590
ip6_finish_output2+0x1b5/0x590
? ip6_output+0x68/0x110
ip6_output+0x68/0x110
? nf_hook.constprop.28+0x79/0xc0
ndisc_send_skb+0x248/0x2e0
ndisc_send_ns+0xf8/0x200
? addrconf_dad_work+0x389/0x4b0
addrconf_dad_work+0x389/0x4b0
? __switch_to_asm+0x34/0x70
? process_one_work+0x195/0x380
? addrconf_dad_completed+0x370/0x370
process_one_work+0x195/0x380
worker_thread+0x30/0x390
? process_one_work+0x380/0x380
kthread+0x113/0x130
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40
Modules linked in: act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd cryptd snd glue_helper joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_net net_failover syscopyarea virtio_blk failover virtio_console sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CR2: 0000000000000000

Validating the control action within tcf_connmark_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff c53075ea Wed Mar 20 08:00:05 MDT 2019 Davide Caratti <dcaratti@redhat.com> net/sched: act_connmark: validate the control action inside init()

the following script:

# tc qdisc add dev crash0 clsact
# tc filter add dev crash0 egress matchall \
> action connmark pass index 90
# tc actions replace action connmark \
> goto chain 42 index 90 cookie c1a0c1a0
# tc actions show action connmark

had the following output:

Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1

action order 0: connmark zone 0 goto chain 42
index 90 ref 2 bind 1
cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 302 Comm: kworker/0:2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:tcf_action_exec+0xb8/0x100
Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
RSP: 0018:ffff9bea406c3ad0 EFLAGS: 00010246
RAX: 000000002000002a RBX: ffff8c5dfc009f00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9bea406c3a80 RDI: ffff8c5dfb9d6ec0
RBP: ffff9bea406c3b70 R08: ffff8c5dfda222a0 R09: ffffffff90933c3c
R10: 0000000000000000 R11: 0000000092793f7d R12: ffff8c5df48b3c00
R13: ffff8c5df48b3c08 R14: 0000000000000001 R15: ffff8c5dfb9d6e40
FS: 0000000000000000(0000) GS:ffff8c5dfda00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000062e0e006 CR4: 00000000001606f0
Call Trace:
tcf_classify+0x58/0x120
__dev_queue_xmit+0x40a/0x890
? ndisc_next_option+0x50/0x50
? ___neigh_create+0x4d5/0x680
? ip6_finish_output2+0x1b5/0x590
ip6_finish_output2+0x1b5/0x590
? ip6_output+0x68/0x110
ip6_output+0x68/0x110
? nf_hook.constprop.28+0x79/0xc0
ndisc_send_skb+0x248/0x2e0
ndisc_send_ns+0xf8/0x200
? addrconf_dad_work+0x389/0x4b0
addrconf_dad_work+0x389/0x4b0
? __switch_to_asm+0x34/0x70
? process_one_work+0x195/0x380
? addrconf_dad_completed+0x370/0x370
process_one_work+0x195/0x380
worker_thread+0x30/0x390
? process_one_work+0x380/0x380
kthread+0x113/0x130
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40
Modules linked in: act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd cryptd snd glue_helper joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_net net_failover syscopyarea virtio_blk failover virtio_console sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CR2: 0000000000000000

Validating the control action within tcf_connmark_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff c53075ea Wed Mar 20 08:00:05 MDT 2019 Davide Caratti <dcaratti@redhat.com> net/sched: act_connmark: validate the control action inside init()

the following script:

# tc qdisc add dev crash0 clsact
# tc filter add dev crash0 egress matchall \
> action connmark pass index 90
# tc actions replace action connmark \
> goto chain 42 index 90 cookie c1a0c1a0
# tc actions show action connmark

had the following output:

Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1

action order 0: connmark zone 0 goto chain 42
index 90 ref 2 bind 1
cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 302 Comm: kworker/0:2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:tcf_action_exec+0xb8/0x100
Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
RSP: 0018:ffff9bea406c3ad0 EFLAGS: 00010246
RAX: 000000002000002a RBX: ffff8c5dfc009f00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9bea406c3a80 RDI: ffff8c5dfb9d6ec0
RBP: ffff9bea406c3b70 R08: ffff8c5dfda222a0 R09: ffffffff90933c3c
R10: 0000000000000000 R11: 0000000092793f7d R12: ffff8c5df48b3c00
R13: ffff8c5df48b3c08 R14: 0000000000000001 R15: ffff8c5dfb9d6e40
FS: 0000000000000000(0000) GS:ffff8c5dfda00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000062e0e006 CR4: 00000000001606f0
Call Trace:
tcf_classify+0x58/0x120
__dev_queue_xmit+0x40a/0x890
? ndisc_next_option+0x50/0x50
? ___neigh_create+0x4d5/0x680
? ip6_finish_output2+0x1b5/0x590
ip6_finish_output2+0x1b5/0x590
? ip6_output+0x68/0x110
ip6_output+0x68/0x110
? nf_hook.constprop.28+0x79/0xc0
ndisc_send_skb+0x248/0x2e0
ndisc_send_ns+0xf8/0x200
? addrconf_dad_work+0x389/0x4b0
addrconf_dad_work+0x389/0x4b0
? __switch_to_asm+0x34/0x70
? process_one_work+0x195/0x380
? addrconf_dad_completed+0x370/0x370
process_one_work+0x195/0x380
worker_thread+0x30/0x390
? process_one_work+0x380/0x380
kthread+0x113/0x130
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40
Modules linked in: act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd cryptd snd glue_helper joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_net net_failover syscopyarea virtio_blk failover virtio_console sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CR2: 0000000000000000

Validating the control action within tcf_connmark_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff c53075ea Wed Mar 20 08:00:05 MDT 2019 Davide Caratti <dcaratti@redhat.com> net/sched: act_connmark: validate the control action inside init()

the following script:

# tc qdisc add dev crash0 clsact
# tc filter add dev crash0 egress matchall \
> action connmark pass index 90
# tc actions replace action connmark \
> goto chain 42 index 90 cookie c1a0c1a0
# tc actions show action connmark

had the following output:

Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1

action order 0: connmark zone 0 goto chain 42
index 90 ref 2 bind 1
cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 302 Comm: kworker/0:2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:tcf_action_exec+0xb8/0x100
Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
RSP: 0018:ffff9bea406c3ad0 EFLAGS: 00010246
RAX: 000000002000002a RBX: ffff8c5dfc009f00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9bea406c3a80 RDI: ffff8c5dfb9d6ec0
RBP: ffff9bea406c3b70 R08: ffff8c5dfda222a0 R09: ffffffff90933c3c
R10: 0000000000000000 R11: 0000000092793f7d R12: ffff8c5df48b3c00
R13: ffff8c5df48b3c08 R14: 0000000000000001 R15: ffff8c5dfb9d6e40
FS: 0000000000000000(0000) GS:ffff8c5dfda00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000062e0e006 CR4: 00000000001606f0
Call Trace:
tcf_classify+0x58/0x120
__dev_queue_xmit+0x40a/0x890
? ndisc_next_option+0x50/0x50
? ___neigh_create+0x4d5/0x680
? ip6_finish_output2+0x1b5/0x590
ip6_finish_output2+0x1b5/0x590
? ip6_output+0x68/0x110
ip6_output+0x68/0x110
? nf_hook.constprop.28+0x79/0xc0
ndisc_send_skb+0x248/0x2e0
ndisc_send_ns+0xf8/0x200
? addrconf_dad_work+0x389/0x4b0
addrconf_dad_work+0x389/0x4b0
? __switch_to_asm+0x34/0x70
? process_one_work+0x195/0x380
? addrconf_dad_completed+0x370/0x370
process_one_work+0x195/0x380
worker_thread+0x30/0x390
? process_one_work+0x380/0x380
kthread+0x113/0x130
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40
Modules linked in: act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd cryptd snd glue_helper joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_net net_failover syscopyarea virtio_blk failover virtio_console sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CR2: 0000000000000000

Validating the control action within tcf_connmark_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff c53075ea Wed Mar 20 08:00:05 MDT 2019 Davide Caratti <dcaratti@redhat.com> net/sched: act_connmark: validate the control action inside init()

the following script:

# tc qdisc add dev crash0 clsact
# tc filter add dev crash0 egress matchall \
> action connmark pass index 90
# tc actions replace action connmark \
> goto chain 42 index 90 cookie c1a0c1a0
# tc actions show action connmark

had the following output:

Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1

action order 0: connmark zone 0 goto chain 42
index 90 ref 2 bind 1
cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 302 Comm: kworker/0:2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:tcf_action_exec+0xb8/0x100
Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
RSP: 0018:ffff9bea406c3ad0 EFLAGS: 00010246
RAX: 000000002000002a RBX: ffff8c5dfc009f00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9bea406c3a80 RDI: ffff8c5dfb9d6ec0
RBP: ffff9bea406c3b70 R08: ffff8c5dfda222a0 R09: ffffffff90933c3c
R10: 0000000000000000 R11: 0000000092793f7d R12: ffff8c5df48b3c00
R13: ffff8c5df48b3c08 R14: 0000000000000001 R15: ffff8c5dfb9d6e40
FS: 0000000000000000(0000) GS:ffff8c5dfda00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000062e0e006 CR4: 00000000001606f0
Call Trace:
tcf_classify+0x58/0x120
__dev_queue_xmit+0x40a/0x890
? ndisc_next_option+0x50/0x50
? ___neigh_create+0x4d5/0x680
? ip6_finish_output2+0x1b5/0x590
ip6_finish_output2+0x1b5/0x590
? ip6_output+0x68/0x110
ip6_output+0x68/0x110
? nf_hook.constprop.28+0x79/0xc0
ndisc_send_skb+0x248/0x2e0
ndisc_send_ns+0xf8/0x200
? addrconf_dad_work+0x389/0x4b0
addrconf_dad_work+0x389/0x4b0
? __switch_to_asm+0x34/0x70
? process_one_work+0x195/0x380
? addrconf_dad_completed+0x370/0x370
process_one_work+0x195/0x380
worker_thread+0x30/0x390
? process_one_work+0x380/0x380
kthread+0x113/0x130
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40
Modules linked in: act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd cryptd snd glue_helper joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_net net_failover syscopyarea virtio_blk failover virtio_console sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CR2: 0000000000000000

Validating the control action within tcf_connmark_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff c53075ea Wed Mar 20 08:00:05 MDT 2019 Davide Caratti <dcaratti@redhat.com> net/sched: act_connmark: validate the control action inside init()

the following script:

# tc qdisc add dev crash0 clsact
# tc filter add dev crash0 egress matchall \
> action connmark pass index 90
# tc actions replace action connmark \
> goto chain 42 index 90 cookie c1a0c1a0
# tc actions show action connmark

had the following output:

Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1

action order 0: connmark zone 0 goto chain 42
index 90 ref 2 bind 1
cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 302 Comm: kworker/0:2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:tcf_action_exec+0xb8/0x100
Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
RSP: 0018:ffff9bea406c3ad0 EFLAGS: 00010246
RAX: 000000002000002a RBX: ffff8c5dfc009f00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9bea406c3a80 RDI: ffff8c5dfb9d6ec0
RBP: ffff9bea406c3b70 R08: ffff8c5dfda222a0 R09: ffffffff90933c3c
R10: 0000000000000000 R11: 0000000092793f7d R12: ffff8c5df48b3c00
R13: ffff8c5df48b3c08 R14: 0000000000000001 R15: ffff8c5dfb9d6e40
FS: 0000000000000000(0000) GS:ffff8c5dfda00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000062e0e006 CR4: 00000000001606f0
Call Trace:
tcf_classify+0x58/0x120
__dev_queue_xmit+0x40a/0x890
? ndisc_next_option+0x50/0x50
? ___neigh_create+0x4d5/0x680
? ip6_finish_output2+0x1b5/0x590
ip6_finish_output2+0x1b5/0x590
? ip6_output+0x68/0x110
ip6_output+0x68/0x110
? nf_hook.constprop.28+0x79/0xc0
ndisc_send_skb+0x248/0x2e0
ndisc_send_ns+0xf8/0x200
? addrconf_dad_work+0x389/0x4b0
addrconf_dad_work+0x389/0x4b0
? __switch_to_asm+0x34/0x70
? process_one_work+0x195/0x380
? addrconf_dad_completed+0x370/0x370
process_one_work+0x195/0x380
worker_thread+0x30/0x390
? process_one_work+0x380/0x380
kthread+0x113/0x130
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40
Modules linked in: act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd cryptd snd glue_helper joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_net net_failover syscopyarea virtio_blk failover virtio_console sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CR2: 0000000000000000

Validating the control action within tcf_connmark_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff c53075ea Wed Mar 20 08:00:05 MDT 2019 Davide Caratti <dcaratti@redhat.com> net/sched: act_connmark: validate the control action inside init()

the following script:

# tc qdisc add dev crash0 clsact
# tc filter add dev crash0 egress matchall \
> action connmark pass index 90
# tc actions replace action connmark \
> goto chain 42 index 90 cookie c1a0c1a0
# tc actions show action connmark

had the following output:

Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1

action order 0: connmark zone 0 goto chain 42
index 90 ref 2 bind 1
cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 302 Comm: kworker/0:2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:tcf_action_exec+0xb8/0x100
Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
RSP: 0018:ffff9bea406c3ad0 EFLAGS: 00010246
RAX: 000000002000002a RBX: ffff8c5dfc009f00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9bea406c3a80 RDI: ffff8c5dfb9d6ec0
RBP: ffff9bea406c3b70 R08: ffff8c5dfda222a0 R09: ffffffff90933c3c
R10: 0000000000000000 R11: 0000000092793f7d R12: ffff8c5df48b3c00
R13: ffff8c5df48b3c08 R14: 0000000000000001 R15: ffff8c5dfb9d6e40
FS: 0000000000000000(0000) GS:ffff8c5dfda00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000062e0e006 CR4: 00000000001606f0
Call Trace:
tcf_classify+0x58/0x120
__dev_queue_xmit+0x40a/0x890
? ndisc_next_option+0x50/0x50
? ___neigh_create+0x4d5/0x680
? ip6_finish_output2+0x1b5/0x590
ip6_finish_output2+0x1b5/0x590
? ip6_output+0x68/0x110
ip6_output+0x68/0x110
? nf_hook.constprop.28+0x79/0xc0
ndisc_send_skb+0x248/0x2e0
ndisc_send_ns+0xf8/0x200
? addrconf_dad_work+0x389/0x4b0
addrconf_dad_work+0x389/0x4b0
? __switch_to_asm+0x34/0x70
? process_one_work+0x195/0x380
? addrconf_dad_completed+0x370/0x370
process_one_work+0x195/0x380
worker_thread+0x30/0x390
? process_one_work+0x380/0x380
kthread+0x113/0x130
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40
Modules linked in: act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd cryptd snd glue_helper joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_net net_failover syscopyarea virtio_blk failover virtio_console sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CR2: 0000000000000000

Validating the control action within tcf_connmark_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff c53075ea Wed Mar 20 08:00:05 MDT 2019 Davide Caratti <dcaratti@redhat.com> net/sched: act_connmark: validate the control action inside init()

the following script:

# tc qdisc add dev crash0 clsact
# tc filter add dev crash0 egress matchall \
> action connmark pass index 90
# tc actions replace action connmark \
> goto chain 42 index 90 cookie c1a0c1a0
# tc actions show action connmark

had the following output:

Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1

action order 0: connmark zone 0 goto chain 42
index 90 ref 2 bind 1
cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 302 Comm: kworker/0:2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:tcf_action_exec+0xb8/0x100
Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
RSP: 0018:ffff9bea406c3ad0 EFLAGS: 00010246
RAX: 000000002000002a RBX: ffff8c5dfc009f00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9bea406c3a80 RDI: ffff8c5dfb9d6ec0
RBP: ffff9bea406c3b70 R08: ffff8c5dfda222a0 R09: ffffffff90933c3c
R10: 0000000000000000 R11: 0000000092793f7d R12: ffff8c5df48b3c00
R13: ffff8c5df48b3c08 R14: 0000000000000001 R15: ffff8c5dfb9d6e40
FS: 0000000000000000(0000) GS:ffff8c5dfda00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000062e0e006 CR4: 00000000001606f0
Call Trace:
tcf_classify+0x58/0x120
__dev_queue_xmit+0x40a/0x890
? ndisc_next_option+0x50/0x50
? ___neigh_create+0x4d5/0x680
? ip6_finish_output2+0x1b5/0x590
ip6_finish_output2+0x1b5/0x590
? ip6_output+0x68/0x110
ip6_output+0x68/0x110
? nf_hook.constprop.28+0x79/0xc0
ndisc_send_skb+0x248/0x2e0
ndisc_send_ns+0xf8/0x200
? addrconf_dad_work+0x389/0x4b0
addrconf_dad_work+0x389/0x4b0
? __switch_to_asm+0x34/0x70
? process_one_work+0x195/0x380
? addrconf_dad_completed+0x370/0x370
process_one_work+0x195/0x380
worker_thread+0x30/0x390
? process_one_work+0x380/0x380
kthread+0x113/0x130
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40
Modules linked in: act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd cryptd snd glue_helper joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_net net_failover syscopyarea virtio_blk failover virtio_console sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CR2: 0000000000000000

Validating the control action within tcf_connmark_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff c53075ea Wed Mar 20 08:00:05 MDT 2019 Davide Caratti <dcaratti@redhat.com> net/sched: act_connmark: validate the control action inside init()

the following script:

# tc qdisc add dev crash0 clsact
# tc filter add dev crash0 egress matchall \
> action connmark pass index 90
# tc actions replace action connmark \
> goto chain 42 index 90 cookie c1a0c1a0
# tc actions show action connmark

had the following output:

Error: Failed to init TC action chain.
We have an error talking to the kernel
total acts 1

action order 0: connmark zone 0 goto chain 42
index 90 ref 2 bind 1
cookie c1a0c1a0

Then, the first packet transmitted by crash0 made the kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
#PF error: [normal kernel read fault]
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 302 Comm: kworker/0:2 Not tainted 5.0.0-rc4.gotochain_crash+ #533
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:tcf_action_exec+0xb8/0x100
Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
RSP: 0018:ffff9bea406c3ad0 EFLAGS: 00010246
RAX: 000000002000002a RBX: ffff8c5dfc009f00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9bea406c3a80 RDI: ffff8c5dfb9d6ec0
RBP: ffff9bea406c3b70 R08: ffff8c5dfda222a0 R09: ffffffff90933c3c
R10: 0000000000000000 R11: 0000000092793f7d R12: ffff8c5df48b3c00
R13: ffff8c5df48b3c08 R14: 0000000000000001 R15: ffff8c5dfb9d6e40
FS: 0000000000000000(0000) GS:ffff8c5dfda00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000062e0e006 CR4: 00000000001606f0
Call Trace:
tcf_classify+0x58/0x120
__dev_queue_xmit+0x40a/0x890
? ndisc_next_option+0x50/0x50
? ___neigh_create+0x4d5/0x680
? ip6_finish_output2+0x1b5/0x590
ip6_finish_output2+0x1b5/0x590
? ip6_output+0x68/0x110
ip6_output+0x68/0x110
? nf_hook.constprop.28+0x79/0xc0
ndisc_send_skb+0x248/0x2e0
ndisc_send_ns+0xf8/0x200
? addrconf_dad_work+0x389/0x4b0
addrconf_dad_work+0x389/0x4b0
? __switch_to_asm+0x34/0x70
? process_one_work+0x195/0x380
? addrconf_dad_completed+0x370/0x370
process_one_work+0x195/0x380
worker_thread+0x30/0x390
? process_one_work+0x380/0x380
kthread+0x113/0x130
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40
Modules linked in: act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 crct10dif_pclmul mbcache crc32_pclmul jbd2 snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd cryptd snd glue_helper joydev virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs ata_generic pata_acpi qxl drm_kms_helper virtio_net net_failover syscopyarea virtio_blk failover virtio_console sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix crc32c_intel serio_raw libata virtio_pci virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
CR2: 0000000000000000

Validating the control action within tcf_connmark_init() proved to fix the
above issue. A TDC selftest is added to verify the correct behavior.

Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Fixes: 97763dc0f401 ("net_sched: reject unknown tcfa_action values")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
/linux-master/net/nfc/
H A Dllcp.hdiff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6709d4b7 Sun Jun 25 03:10:07 MDT 2023 Lin Ma <linma@zju.edu.cn> net: nfc: Fix use-after-free caused by nfc_llcp_find_local

This commit fixes several use-after-free that caused by function
nfc_llcp_find_local(). For example, one UAF can happen when below buggy
time window occurs.

// nfc_genl_llc_get_params | // nfc_unregister_device
|
dev = nfc_get_device(idx); | device_lock(...)
if (!dev) | dev->shutting_down = true;
return -ENODEV; | device_unlock(...);
|
device_lock(...); | // nfc_llcp_unregister_device
| nfc_llcp_find_local()
nfc_llcp_find_local(...); |
| local_cleanup()
if (!local) { |
rc = -ENODEV; | // nfc_llcp_local_put
goto exit; | kref_put(.., local_release)
} |
| // local_release
| list_del(&local->list)
// nfc_genl_send_params | kfree()
local->dev->idx !!!UAF!!! |
|

and the crash trace for the one of the discussed UAF like:

BUG: KASAN: slab-use-after-free in nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
Read of size 8 at addr ffff888105b0e410 by task 20114

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x72/0xa0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:319 [inline]
print_report+0xcc/0x620 mm/kasan/report.c:430
kasan_report+0xb2/0xe0 mm/kasan/report.c:536
nfc_genl_send_params net/nfc/netlink.c:999 [inline]
nfc_genl_llc_get_params+0x72f/0x780 net/nfc/netlink.c:1045
genl_family_rcv_msg_doit.isra.0+0x1ee/0x2e0 net/netlink/genetlink.c:968
genl_family_rcv_msg net/netlink/genetlink.c:1048 [inline]
genl_rcv_msg+0x503/0x7d0 net/netlink/genetlink.c:1065
netlink_rcv_skb+0x161/0x430 net/netlink/af_netlink.c:2548
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x644/0x900 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x934/0xe70 net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0x1b6/0x200 net/socket.c:747
____sys_sendmsg+0x6e9/0x890 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmsg+0xf7/0x1d0 net/socket.c:2584
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f34640a2389
RSP: 002b:00007f3463415168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f34641c1f80 RCX: 00007f34640a2389
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000006
RBP: 00007f34640ed493 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe38449ecf R14: 00007f3463415300 R15: 0000000000022000
</TASK>

Allocated by task 20116:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0x7f/0x90 mm/kasan/common.c:383
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
nfc_llcp_register_device+0x49/0xa40 net/nfc/llcp_core.c:1567
nfc_register_device+0x61/0x260 net/nfc/core.c:1124
nci_register_device+0x776/0xb20 net/nfc/nci/core.c:1257
virtual_ncidev_open+0x147/0x230 drivers/nfc/virtual_ncidev.c:148
misc_open+0x379/0x4a0 drivers/char/misc.c:165
chrdev_open+0x26c/0x780 fs/char_dev.c:414
do_dentry_open+0x6c4/0x12a0 fs/open.c:920
do_open fs/namei.c:3560 [inline]
path_openat+0x24fe/0x37e0 fs/namei.c:3715
do_filp_open+0x1ba/0x410 fs/namei.c:3742
do_sys_openat2+0x171/0x4c0 fs/open.c:1356
do_sys_open fs/open.c:1372 [inline]
__do_sys_openat fs/open.c:1388 [inline]
__se_sys_openat fs/open.c:1383 [inline]
__x64_sys_openat+0x143/0x200 fs/open.c:1383
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Freed by task 20115:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x50 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free mm/kasan/common.c:200 [inline]
__kasan_slab_free+0x10a/0x190 mm/kasan/common.c:244
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook mm/slub.c:1807 [inline]
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0x7a/0x190 mm/slub.c:3800
local_release net/nfc/llcp_core.c:174 [inline]
kref_put include/linux/kref.h:65 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:182 [inline]
nfc_llcp_local_put net/nfc/llcp_core.c:177 [inline]
nfc_llcp_unregister_device+0x206/0x290 net/nfc/llcp_core.c:1620
nfc_unregister_device+0x160/0x1d0 net/nfc/core.c:1179
virtual_ncidev_close+0x52/0xa0 drivers/nfc/virtual_ncidev.c:163
__fput+0x252/0xa20 fs/file_table.c:321
task_work_run+0x174/0x270 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x108/0x110 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x21/0x50 kernel/entry/common.c:297
do_syscall_64+0x4c/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc

Last potentially related work creation:
kasan_save_stack+0x22/0x50 mm/kasan/common.c:45
__kasan_record_aux_stack+0x95/0xb0 mm/kasan/generic.c:491
kvfree_call_rcu+0x29/0xa80 kernel/rcu/tree.c:3328
drop_sysctl_table+0x3be/0x4e0 fs/proc/proc_sysctl.c:1735
unregister_sysctl_table.part.0+0x9c/0x190 fs/proc/proc_sysctl.c:1773
unregister_sysctl_table+0x24/0x30 fs/proc/proc_sysctl.c:1753
neigh_sysctl_unregister+0x5f/0x80 net/core/neighbour.c:3895
addrconf_notify+0x140/0x17b0 net/ipv6/addrconf.c:3684
notifier_call_chain+0xbe/0x210 kernel/notifier.c:87
call_netdevice_notifiers_info+0xb5/0x150 net/core/dev.c:1937
call_netdevice_notifiers_extack net/core/dev.c:1975 [inline]
call_netdevice_notifiers net/core/dev.c:1989 [inline]
dev_change_name+0x3c3/0x870 net/core/dev.c:1211
dev_ifsioc+0x800/0xf70 net/core/dev_ioctl.c:376
dev_ioctl+0x3d9/0xf80 net/core/dev_ioctl.c:542
sock_do_ioctl+0x160/0x260 net/socket.c:1213
sock_ioctl+0x3f9/0x670 net/socket.c:1316
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x19e/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3f/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The buggy address belongs to the object at ffff888105b0e400
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 16 bytes inside of
freed 1024-byte region [ffff888105b0e400, ffff888105b0e800)

The buggy address belongs to the physical page:
head:ffffea000416c200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x200000000010200(slab|head|node=0|zone=2)
raw: 0200000000010200 ffff8881000430c0 ffffea00044c7010 ffffea0004510e10
raw: 0000000000000000 00000000000a000a 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff888105b0e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888105b0e380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888105b0e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888105b0e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888105b0e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

In summary, this patch solves those use-after-free by

1. Re-implement the nfc_llcp_find_local(). The current version does not
grab the reference when getting the local from the linked list. For
example, the llcp_sock_bind() gets the reference like below:

// llcp_sock_bind()

local = nfc_llcp_find_local(dev); // A
..... \
| raceable
..... /
llcp_sock->local = nfc_llcp_local_get(local); // B

There is an apparent race window that one can drop the reference
and free the local object fetched in (A) before (B) gets the reference.

2. Some callers of the nfc_llcp_find_local() do not grab the reference
at all. For example, the nfc_genl_llc_{{get/set}_params/sdreq} functions.
We add the nfc_llcp_local_put() for them. Moreover, we add the necessary
error handling function to put the reference.

3. Add the nfc_llcp_remove_local() helper. The local object is removed
from the linked list in local_release() when all reference is gone. This
patch removes it when nfc_llcp_unregister_device() is called.

Therefore, every caller of nfc_llcp_find_local() will get a reference
even when the nfc_llcp_unregister_device() is called. This promises no
use-after-free for the local object is ever possible.

Fixes: 52feb444a903 ("NFC: Extend netlink interface for LTO, RW, and MIUX parameters support")
Fixes: c7aa12252f51 ("NFC: Take a reference on the LLCP local pointer when creating a socket")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
/linux-master/drivers/gpu/drm/nouveau/nvkm/core/
H A Dfirmware.cdiff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
diff 52a6947b Mon Apr 29 12:23:08 MDT 2024 Lyude Paul <lyude@redhat.com> drm/nouveau/firmware: Fix SG_DEBUG error with nvkm_firmware_ctor()

Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:

kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>

We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.

So, fix this by allocating the memory with vmalloc instead().

V2:
* Fixup explanation as the prior one was bogus

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
/linux-master/arch/arm64/kvm/
H A Dpkvm.cdiff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
diff fa729bc7 Tue Jul 04 13:32:43 MDT 2023 Sudeep Holla <sudeep.holla@arm.com> KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm

Currently there is no synchronisation between finalize_pkvm() and
kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if
kvm_arm_init() fails resulting in the following warning on all the CPUs
and eventually a HYP panic:

| kvm [1]: IPA Size Limit: 48 bits
| kvm [1]: Failed to init hyp memory protection
| kvm [1]: error initializing Hyp mode: -22
|
| <snip>
|
| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
| pc : _kvm_host_prot_finalize+0x30/0x50
| lr : __flush_smp_call_function_queue+0xd8/0x230
|
| Call trace:
| _kvm_host_prot_finalize+0x3c/0x50
| on_each_cpu_cond_mask+0x3c/0x6c
| pkvm_drop_host_privileges+0x4c/0x78
| finalize_pkvm+0x3c/0x5c
| do_one_initcall+0xcc/0x240
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x100/0x16c
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
| Failed to finalize Hyp protection: -22
| dtb=fvp-base-revc.dtb
| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540!
| kvm [95]: nVHE call trace:
| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8
| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac
| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160
| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4
| kvm [95]: ---[ end nVHE call trace ]---
| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000
| Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000
| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237
| Hardware name: FVP Base RevC (DT)
| Workqueue: rpciod rpc_async_schedule
| Call trace:
| dump_backtrace+0xec/0x108
| show_stack+0x18/0x2c
| dump_stack_lvl+0x50/0x68
| dump_stack+0x18/0x24
| panic+0x138/0x33c
| nvhe_hyp_panic_handler+0x100/0x184
| new_slab+0x23c/0x54c
| ___slab_alloc+0x3e4/0x770
| kmem_cache_alloc_node+0x1f0/0x278
| __alloc_skb+0xdc/0x294
| tcp_stream_alloc_skb+0x2c/0xf0
| tcp_sendmsg_locked+0x3d0/0xda4
| tcp_sendmsg+0x38/0x5c
| inet_sendmsg+0x44/0x60
| sock_sendmsg+0x1c/0x34
| xprt_sock_sendmsg+0xdc/0x274
| xs_tcp_send_request+0x1ac/0x28c
| xprt_transmit+0xcc/0x300
| call_transmit+0x78/0x90
| __rpc_execute+0x114/0x3d8
| rpc_async_schedule+0x28/0x48
| process_one_work+0x1d8/0x314
| worker_thread+0x248/0x474
| kthread+0xfc/0x184
| ret_from_fork+0x10/0x20
| SMP: stopping secondary CPUs
| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000
| PHYS_OFFSET: 0x80000000
| CPU features: 0x00000000,1035b7a3,ccfe773f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:
| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800
| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000
| VCPU:0000000000000000 ]---

Fix it by checking for the successfull initialisation of kvm_arm_init()
in finalize_pkvm() before proceeding any futher.

Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege")
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
/linux-master/drivers/net/ethernet/intel/fm10k/
H A Dfm10k_debugfs.c7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7461fd91 Sat Sep 20 17:53:23 MDT 2014 Alexander Duyck <alexander.h.duyck@intel.com> fm10k: Add support for debugfs

This patch adds limited debugfs support for the driver. Most of the
functionality needed for dumping registers is already provided via ethtool.
The only thing we saw that we really neeed was the ability to dump the
descriptor rings so as such this patch will add a fm10k directory containing a
listing of directories each one with a unique PCI Bus, Device, and Function
number. Each of those BDF directories will have a list of q_vectors, and
the q_vectors will contain a file for each of the Rx/Tx rings that are a part
of the vector. For example:

# ls -RD /sys/kernel/debug/fm10k/
/sys/kernel/debug/fm10k/:
0000:01:00.0

/sys/kernel/debug/fm10k/0000:01:00.0:
q_vector.000 q_vector.001 q_vector.002 q_vector.003

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000:
rx_ring.000 tx_ring.000

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.001:
rx_ring.001 tx_ring.001

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.002:
rx_ring.002 tx_ring.002

/sys/kernel/debug/fm10k/0000:01:00.0/q_vector.003:
rx_ring.003 tx_ring.003

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/rx_ring.000
DES DATA RSS STATERR LENGTH VLAN DGLORT SGLORT TIMESTAMP
---------------------------------------------------------------------------
000 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x13951807dc4fedf0
001 0x00000000 0x00000000 0x00000003 0x002a 0x0000 0x0000 0x0000 0x1395180906c9f2c8
002 0x3731c000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
003 0x3731d000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
004 0xaab3a000 0x00000000 0x00000000 0x0000 0x0000 0x0000 0x0000 0x0000000000000000
...

# cat /sys/kernel/debug/fm10k/0000:01:00.0/q_vector.000/tx_ring.000
DES BUFFER_ADDRESS LENGTH VLAN MSS HDRLEN FLAGS
---------------------------------------------------------
000 0x00000000aa8a1002 0x005a 0x0000 0x0000 0x0000 0xc0
001 0x00000000aa8a2002 0x005a 0x0000 0x0000 0x0000 0xc0
002 0x000000006bc13202 0x004e 0x0000 0x0000 0x0000 0xc0
003 0x000000006bc13c02 0x002a 0x0000 0x0000 0x0000 0xe1
004 0x000000006bc13602 0x0062 0x0000 0x0000 0x0000 0xc0

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
/linux-master/drivers/net/ethernet/mediatek/
H A Dmtk_wed_wo.cdiff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
diff 65e6af6c Thu Dec 01 08:26:53 MST 2022 Lorenzo Bianconi <lorenzo@kernel.org> net: ethernet: mtk_wed: fix sleep while atomic in mtk_wed_wo_queue_refill

In order to fix the following sleep while atomic bug always alloc pages
with GFP_ATOMIC in mtk_wed_wo_queue_refill since page_frag_alloc runs in
spin_lock critical section.

[ 9.049719] Hardware name: MediaTek MT7986a RFB (DT)
[ 9.054665] Call trace:
[ 9.057096] dump_backtrace+0x0/0x154
[ 9.060751] show_stack+0x14/0x1c
[ 9.064052] dump_stack_lvl+0x64/0x7c
[ 9.067702] dump_stack+0x14/0x2c
[ 9.071001] ___might_sleep+0xec/0x120
[ 9.074736] __might_sleep+0x4c/0x9c
[ 9.078296] __alloc_pages+0x184/0x2e4
[ 9.082030] page_frag_alloc_align+0x98/0x1ac
[ 9.086369] mtk_wed_wo_queue_refill+0x134/0x234
[ 9.090974] mtk_wed_wo_init+0x174/0x2c0
[ 9.094881] mtk_wed_attach+0x7c8/0x7e0
[ 9.098701] mt7915_mmio_wed_init+0x1f0/0x3a0 [mt7915e]
[ 9.103940] mt7915_pci_probe+0xec/0x3bc [mt7915e]
[ 9.108727] pci_device_probe+0xac/0x13c
[ 9.112638] really_probe.part.0+0x98/0x2f4
[ 9.116807] __driver_probe_device+0x94/0x13c
[ 9.121147] driver_probe_device+0x40/0x114
[ 9.125314] __driver_attach+0x7c/0x180
[ 9.129133] bus_for_each_dev+0x5c/0x90
[ 9.132953] driver_attach+0x20/0x2c
[ 9.136513] bus_add_driver+0x104/0x1fc
[ 9.140333] driver_register+0x74/0x120
[ 9.144153] __pci_register_driver+0x40/0x50
[ 9.148407] mt7915_init+0x5c/0x1000 [mt7915e]
[ 9.152848] do_one_initcall+0x40/0x25c
[ 9.156669] do_init_module+0x44/0x230
[ 9.160403] load_module+0x1f30/0x2750
[ 9.164135] __do_sys_init_module+0x150/0x200
[ 9.168475] __arm64_sys_init_module+0x18/0x20
[ 9.172901] invoke_syscall.constprop.0+0x4c/0xe0
[ 9.177589] do_el0_svc+0x48/0xe0
[ 9.180889] el0_svc+0x14/0x50
[ 9.183929] el0t_64_sync_handler+0x9c/0x120
[ 9.188183] el0t_64_sync+0x158/0x15c

Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/67ca94bdd3d9eaeb86e52b3050fbca0bcf7bb02f.1669908312.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
/linux-master/fs/ntfs3/
H A Dattrlist.cdiff 4cdfb6e7 Thu Dec 21 03:59:43 MST 2023 Konstantin Komarov <almaz.alexandrovich@paragon-software.com> fs/ntfs3: Disable ATTR_LIST_ENTRY size check

The use of sizeof(struct ATTR_LIST_ENTRY) has been replaced with le_size(0)
due to alignment peculiarities on different platforms.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202312071005.g6YrbaIe-lkp@intel.com/
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
diff 6db62086 Fri Aug 05 10:47:27 MDT 2022 Edward Lo <edward.lo@ambergroup.io> fs/ntfs3: Validate data run offset

This adds sanity checks for data run offset. We should make sure data
run offset is legit before trying to unpack them, otherwise we may
encounter use-after-free or some unexpected memory access behaviors.

[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570
[ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240
[ 82.941670]
[ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15
[ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 82.943720] Call Trace:
[ 82.944204] <TASK>
[ 82.944471] dump_stack_lvl+0x49/0x63
[ 82.944908] print_report.cold+0xf5/0x67b
[ 82.945141] ? __wait_on_bit+0x106/0x120
[ 82.945750] ? run_unpack+0x2e3/0x570
[ 82.946626] kasan_report+0xa7/0x120
[ 82.947046] ? run_unpack+0x2e3/0x570
[ 82.947280] __asan_load1+0x51/0x60
[ 82.947483] run_unpack+0x2e3/0x570
[ 82.947709] ? memcpy+0x4e/0x70
[ 82.947927] ? run_pack+0x7a0/0x7a0
[ 82.948158] run_unpack_ex+0xad/0x3f0
[ 82.948399] ? mi_enum_attr+0x14a/0x200
[ 82.948717] ? run_unpack+0x570/0x570
[ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0
[ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0
[ 82.949611] ? mi_read+0x262/0x2c0
[ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180
[ 82.950249] ntfs_iget5+0x632/0x1870
[ 82.950621] ? ntfs_get_block_bmap+0x70/0x70
[ 82.951192] ? evict+0x223/0x280
[ 82.951525] ? iput.part.0+0x286/0x320
[ 82.951969] ntfs_fill_super+0x1321/0x1e20
[ 82.952436] ? put_ntfs+0x1d0/0x1d0
[ 82.952822] ? vsprintf+0x20/0x20
[ 82.953188] ? mutex_unlock+0x81/0xd0
[ 82.953379] ? set_blocksize+0x95/0x150
[ 82.954001] get_tree_bdev+0x232/0x370
[ 82.954438] ? put_ntfs+0x1d0/0x1d0
[ 82.954700] ntfs_fs_get_tree+0x15/0x20
[ 82.955049] vfs_get_tree+0x4c/0x130
[ 82.955292] path_mount+0x645/0xfd0
[ 82.955615] ? putname+0x80/0xa0
[ 82.955955] ? finish_automount+0x2e0/0x2e0
[ 82.956310] ? kmem_cache_free+0x110/0x390
[ 82.956723] ? putname+0x80/0xa0
[ 82.957023] do_mount+0xd6/0xf0
[ 82.957411] ? path_mount+0xfd0/0xfd0
[ 82.957638] ? __kasan_check_write+0x14/0x20
[ 82.957948] __x64_sys_mount+0xca/0x110
[ 82.958310] do_syscall_64+0x3b/0x90
[ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 82.959341] RIP: 0033:0x7fd0d1ce948a
[ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008
[ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a
[ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0
[ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020
[ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0
[ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff

Signed-off-by: Edward Lo <edward.lo@ambergroup.io>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
/linux-master/include/keys/
H A Duser-type.hdiff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
diff d3ec10aa Sat Mar 21 19:11:24 MDT 2020 Waiman Long <longman@redhat.com> KEYS: Don't write out to userspace while holding key semaphore

A lockdep circular locking dependency report was seen when running a
keyutils test:

[12537.027242] ======================================================
[12537.059309] WARNING: possible circular locking dependency detected
[12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - -
[12537.125253] ------------------------------------------------------
[12537.153189] keyctl/25598 is trying to acquire lock:
[12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0
[12537.208365]
[12537.208365] but task is already holding lock:
[12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12537.270476]
[12537.270476] which lock already depends on the new lock.
[12537.270476]
[12537.307209]
[12537.307209] the existing dependency chain (in reverse order) is:
[12537.340754]
[12537.340754] -> #3 (&type->lock_class){++++}:
[12537.367434] down_write+0x4d/0x110
[12537.385202] __key_link_begin+0x87/0x280
[12537.405232] request_key_and_link+0x483/0xf70
[12537.427221] request_key+0x3c/0x80
[12537.444839] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.496731] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.601045] kthread+0x30c/0x3d0
[12537.617906] ret_from_fork+0x3a/0x50
[12537.636225]
[12537.636225] -> #2 (root_key_user.cons_lock){+.+.}:
[12537.664525] __mutex_lock+0x105/0x11f0
[12537.683734] request_key_and_link+0x35a/0xf70
[12537.705640] request_key+0x3c/0x80
[12537.723304] dns_query+0x1db/0x5a5 [dns_resolver]
[12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs]
[12537.775607] cifs_reconnect+0xe04/0x2500 [cifs]
[12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs]
[12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs]
[12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs]
[12537.873477] kthread+0x30c/0x3d0
[12537.890281] ret_from_fork+0x3a/0x50
[12537.908649]
[12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}:
[12537.935225] __mutex_lock+0x105/0x11f0
[12537.954450] cifs_call_async+0x102/0x7f0 [cifs]
[12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs]
[12538.000659] cifs_readpages+0x120a/0x1e50 [cifs]
[12538.023920] read_pages+0xf5/0x560
[12538.041583] __do_page_cache_readahead+0x41d/0x4b0
[12538.067047] ondemand_readahead+0x44c/0xc10
[12538.092069] filemap_fault+0xec1/0x1830
[12538.111637] __do_fault+0x82/0x260
[12538.129216] do_fault+0x419/0xfb0
[12538.146390] __handle_mm_fault+0x862/0xdf0
[12538.167408] handle_mm_fault+0x154/0x550
[12538.187401] __do_page_fault+0x42f/0xa60
[12538.207395] do_page_fault+0x38/0x5e0
[12538.225777] page_fault+0x1e/0x30
[12538.243010]
[12538.243010] -> #0 (&mm->mmap_sem){++++}:
[12538.267875] lock_acquire+0x14c/0x420
[12538.286848] __might_fault+0x119/0x1b0
[12538.306006] keyring_read_iterator+0x7e/0x170
[12538.327936] assoc_array_subtree_iterate+0x97/0x280
[12538.352154] keyring_read+0xe9/0x110
[12538.370558] keyctl_read_key+0x1b9/0x220
[12538.391470] do_syscall_64+0xa5/0x4b0
[12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[12538.435535]
[12538.435535] other info that might help us debug this:
[12538.435535]
[12538.472829] Chain exists of:
[12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class
[12538.472829]
[12538.524820] Possible unsafe locking scenario:
[12538.524820]
[12538.551431] CPU0 CPU1
[12538.572654] ---- ----
[12538.595865] lock(&type->lock_class);
[12538.613737] lock(root_key_user.cons_lock);
[12538.644234] lock(&type->lock_class);
[12538.672410] lock(&mm->mmap_sem);
[12538.687758]
[12538.687758] *** DEADLOCK ***
[12538.687758]
[12538.714455] 1 lock held by keyctl/25598:
[12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220
[12538.770573]
[12538.770573] stack backtrace:
[12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G
[12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015
[12538.881963] Call Trace:
[12538.892897] dump_stack+0x9a/0xf0
[12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279
[12538.932891] ? save_trace+0xd6/0x250
[12538.948979] check_prev_add.constprop.32+0xc36/0x14f0
[12538.971643] ? keyring_compare_object+0x104/0x190
[12538.992738] ? check_usage+0x550/0x550
[12539.009845] ? sched_clock+0x5/0x10
[12539.025484] ? sched_clock_cpu+0x18/0x1e0
[12539.043555] __lock_acquire+0x1f12/0x38d0
[12539.061551] ? trace_hardirqs_on+0x10/0x10
[12539.080554] lock_acquire+0x14c/0x420
[12539.100330] ? __might_fault+0xc4/0x1b0
[12539.119079] __might_fault+0x119/0x1b0
[12539.135869] ? __might_fault+0xc4/0x1b0
[12539.153234] keyring_read_iterator+0x7e/0x170
[12539.172787] ? keyring_read+0x110/0x110
[12539.190059] assoc_array_subtree_iterate+0x97/0x280
[12539.211526] keyring_read+0xe9/0x110
[12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0
[12539.249076] keyctl_read_key+0x1b9/0x220
[12539.266660] do_syscall_64+0xa5/0x4b0
[12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf

One way to prevent this deadlock scenario from happening is to not
allow writing to userspace while holding the key semaphore. Instead,
an internal buffer is allocated for getting the keys out from the
read method first before copying them out to userspace without holding
the lock.

That requires taking out the __user modifier from all the relevant
read methods as well as additional changes to not use any userspace
write helpers. That is,

1) The put_user() call is replaced by a direct copy.
2) The copy_to_user() call is replaced by memcpy().
3) All the fault handling code is removed.

Compiling on a x86-64 system, the size of the rxrpc_read() function is
reduced from 3795 bytes to 2384 bytes with this patch.

Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>

Completed in 1442 milliseconds

1234567891011>>